[robocup-worldwide] Dissertation announcement: "Structured Exploration for Reinforcement Learning"

Nick Jong nkj at cs.utexas.edu
Fri Jan 7 13:53:21 EST 2011


** Apologies if you receive this multiple times **

Greetings,

I'm proud to announce that in December I successfully defended my PhD
thesis at the University of Texas at Austin.  Entitled

  "Structured Exploration for Reinforcement Learning," 

my thesis extends the model-based exploration of R-MAX to continuous
state spaces and to hierarchical RL.  The resulting algorithm, Fitted
R-MAXQ, tempers aggressive exploration with the structure inherent in
natural forms of domain knowledge, such as task hierarchies and distance
functions over the state space.  To the best of my knowledge, Fitted
R-MAXQ is the first reinforcement learning algorithm to simultaneously
address function approximation, model-based learning, and hierarchy.

I published the code from my thesis work to the RL Library, for the
benefit of those who might want to reproduce or extend my results.

  http://library.rl-community.org/wiki/Fitted_R-MAXQ

The thesis abstract is below and the full thesis is available at:

  http://www.cs.utexas.edu/forms/tech_reports/reports/tr/TR-2008.pdf.

For a high-level overview, check out the annotated slides from my thesis
defense:

  http://www.cs.utexas.edu/users/nkj/thesis-defense-notes.pdf

I hope that my work will prove a useful resource for you! Please don't
hesitate to send me any feedback or questions.

Best regards,
Nick

===========

"Structured Exploration for Reinforcement Learning"

Reinforcement Learning (RL) offers a promising approach towards achieving the dream of autonomous agents that can behave intelligently in the real world.  Instead of requiring humans to determine the correct behaviors or sufficient knowledge in advance, RL algorithms allow an agent to acquire the necessary knowledge through direct experience with its environment.  Early algorithms guaranteed convergence to optimal behaviors in limited domains, giving hope that simple, universal mechanisms would allow learning agents to succeed at solving a wide variety of complex problems.  In practice, the field of RL has struggled to apply these techniques successfully to the full breadth and depth of real-world domains.

This thesis extends the reach of RL techniques by demonstrating the synergies among certain key developments in the literature.  The first of these developments is model-based exploration, which facilitates theoretical convergence guarantees in finite problems by explicitly reasoning about an agent's certainty in its understanding of its environment.  A second branch of research studies function approximation, which generalizes RL to infinite problems by artificially limiting the degrees of freedom in an agent's representation of its environment.  The final major advance that this thesis incorporates is hierarchical decomposition, which seeks to improve the efficiency of learning by endowing an agent's knowledge and behavior with the gross structure of its environment.

Each of these ideas has intuitive appeal and sustains substantial independent research efforts, but this thesis defines the first RL agent that combines all their benefits in the general case.  In showing how to combine these techniques effectively, this thesis investigates the twin issues of generalization and exploration, which lie at the heart of efficient learning.  This thesis thus lays the groundwork for the next generation of RL algorithms, which will allow scientific agents to know when it suffices to estimate a plan from current data and when to accept the potential cost of running an experiment to gather new data.


More information about the robocup-worldwide mailing list