Abstractâ€” The problem of balancing an inverted pendulum
on an unmanned aerial vehicle (UAV) has been achieved using
linear and nonlinear control approaches. However, to the best of
our knowledge, this problem has not been solved using learning
methods. On the other hand, the classical inverted pendulum is
a common benchmark problem to evaluate learning techniques.
In this paper we demonstrate a novel solution to the inverted
pendulum problem extended to UAVs, specifically quadrotors.
This complex system is underactuated and sensitive to small
acceleration changes of the quadrotor. The solution is provided
by reinforcement learning (RL), a platform commonly applied
to solve nonlinear control problems. We generate a control
policy to balance the pendulum using Continuous Action Fitted
Value Iteration (CAFVI) [1] which is a RL algorithm for high-
dimensional input-spaces. This technique combines learning of
both state and state-action value functions in an approximate
value iteration setting with continuous inputs. Simulations verify
the performance of the generated control policy for varying initial
conditions. The results show the control policy is computationally
fast enough to be appropriate of real-time control.
Index Termsâ€” Aerial robotics, quadrotor control, inverted pen-
dulum, approximate value iteration, reinforcement learning.