Learning and Planning in Large Stochastic Environments
One research area that particularly interests me is learning about acting. That is, learning to perform sequential tasks in a complex environment with sensory feedback. For example, we would much rather program a mobile robot to learn its way around a building (and even learn the dynamics of its own controllers) than to program it directly to do the task. Direct programming is tedious and fraught with errors, as it's incredibly difficult to forsee all of the circumstances that the robot will encounter. And when you intend to drop your mobile robot rover onto, say, Mars and let it go, then autonomous control and learning become much more important.
This problem spawned the field of reinforcement learning which, in turn, found itself adopting (some might say reinventing) the Markov decision process (MDP) formalism, and its nastier big sibling, the partially observable MDP or POMDP. In their original, Operations Research, formulations these models were used primarily for control of discrete (time and/or space) stochastic processes. The reinforcement learning community has added an interest in model identification and acquisition of optimal control policies through experience rather than through direct solution of models.
Like many problems in AI and ML, the MDP and POMDP formulations of the reinforcement learning problem are well formed, plausible, mathematically compelling, very general and descriptive, and utterly intractable. We can solve small (rougly 100,000 state) MDP models directly and do reinforcement learning in spaces of similar size, but our methods depend largely on dynamic programming or matrix inversion over the entire, explicitly represented, state space. When we want to move to propositional/factored state space representations (i.e., each state is represented as a product of propositional variables, or fluents), the state space explodes exponentially and our best known methods quickly break down. Moving to relational/first order representations makes things even worse, and POMDPs are intractable even for small state spaces.
I'm interested in scaling techniques for planning and learning in the MDP formalism. My recent work in this area, while I was a postdoctoral researcher studying with Leslie Kaelbling at the MIT AI Lab concerned finding and exploiting "nearly deterministic" subcomponents within navigational MDPs. By finding sub-goals that the agent can reach with very high probability (1-epsilon), we can reduce an exponentially large MDP path planning problem to a polynomial series of tractably small MDPs and a deterministic meta-planning problem. The meta-planning problem can then be solved with special-purpose planning software that runs much more quickly in the large, propositional state spaces. I'm currently working on extending this work to scheduling applications, POMDPs and continuous state space MDPs, and model/sub-goal learning.
Resources
- I have compiled a reading list on scaling issues and methods for learning and planning in MDPs. This list is an ongoing work -- if you have any useful references for this list, please send them to me.
- Matlab software for doing learning and planning in very small, gridworld-style maze navigation problems. This software does not provide general-purpose MDP facilities; it is intended for working with small mazes and navigational problems and largely for prototyping purposes.
- Java MDP toolbox: A set of Java utilities for manipulating and planning in large, general purpose MDPs. Includes support for atomic and factored state spaces; sparse, dynamic Bayes network (DBN), or decision tree transition functions; simulation and sampling; solution of small models via value or policy iteration; policy evaluation for small models; and hooks to support partially observable MDPs (POMDPs). Currently, there is no support for learning, but the infrastructure should support it relatively straightforwardly. Currently available as a gzipped tar file of sources, though I'll probably release a jar file in the near future (especially if there's demand). Requires the OR Objects v. 1.2.4 or better from OpsResearch.com (for support of sparse matrices and fundamental linear algebra operations) and the libtdl Java support library.
