From Multi-robot Pursuit to Algorithmic Sorting: Reinforcement Learning for High Dimensional Problems

Manually deriving optimal robot motion is difficult when a robot is required to balance 
its actions between opposing preferences. A solution to this has been to learn the task 
in order to automatically perform the near-optimal motions. Learning has been applied to 
several problems such as swing-free UAV flight, table tennis, and autonomous driving. One 
learning method in particular, reinforcement learning (RL), has proven highly successful 
at learning robot motion parameters through experimentation. However, high-dimensional 
problems often prove challenging for RL. We address the dimensionality constraint with 
PrEference Appraisal Reinforcement Learning (PEARL) that projects the high-dimensional 
continuous robot state space to the low-dimensional preference feature space. This talk 
formalizes PEARL polymorphic feature selection, and generalizes it to high-dimensional 
multi-robot systems and software engineering for automated computing agents. We demonstrate 
the approach first on a robotics task, a multi-agent pursuit problem, where we solve a 
1000-agent pursuit in the agents' joint continuous state and action space (4000-dimensional 
vector space). Then, we apply the same approach to a computing problem, array sorting, 
where we develop a RL sorting agent and assess its robustness to unreliable components 
(100-dimensional vectors).