[robocup-nao] Paper on Model-based RL on the Nao

Tue Jan 19 18:14:53 EST 2010

** Apologies for duplicates **

Colleagues,

I'd like to point your attention towards a paper that we will be publishing
in this year's International Conference on Robotics and Automation (ICRA).
In it, we have Nao robots learn to score penalty kicks via a model-based
reinforcement learning algorithm.  We perform learning both in the Webots
simulator and on the physical robots.  I hope that you will find it
interesting!

The paper can be found at:
http://www.cs.utexas.edu/~pstone/Papers/bib2html/b2hd-ICRA10-hester.html<http://www.cs.utexas.edu/%7Epstone/Papers/bib2html/b2hd-ICRA10-hester.html>

And we have an accompanying video!
http://www.youtube.com/watch?v=mRpX9DFCdwI

I'm including the title and abstract below.  Feedback of any kind is most
welcome.

Thanks,
    Todd

==============================

Generalized Model Learning for Reinforcement Learning on a Humanoid Robot.
Todd Hester, Michael Quinlan, and Peter Stone.
International Conference on Robotics and Automation (ICRA), May 2010.

Abstract:
Reinforcement learning (RL) algorithms have long been promising methods for
enabling an autonomous robot to improve its behavior on sequential
decision-making tasks. The obvious enticement is that the robot should be
able to improve its own behavior without the need for detailed step-by-step
programming. However, for RL to reach its full potential, the algorithms
must be sample efficient: they must learn competent behavior from very few
real-world trials. From this perspective, model-based methods, which use
experiential data more efficiently than model-free approaches, are
appealing. But they often require exhaustive exploration to learn an
accurate model of the domain. In this paper, we present an algorithm,
Reinforcement Learning with Decision Trees (RL-DT), that uses decision trees
to learn the model by generalizing the relative effect of actions across
states. The agent explores the environment until it believes it has a
reasonable policy. The combination of the learning approach with the
targeted exploration policy enables fast learning of the model. We compare
RL-DT against standard model-free and model-based learning methods, and
demonstrate its effectiveness on an Aldebaran Nao humanoid robot scoring
goals in a penalty kick scenario.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.cc.gatech.edu/pipermail/robocup-nao/attachments/20100119/7729666a/attachment.html