Real-world tasks are often highly structured. Hierarchical reinforcement learning (HRL) has attracted research interest as an approach for leveraging the hierarchical structure of a given task in reinforcement learning (RL). However, identifying the hierarchical policy structure that enhances the performance of RL is not a trivial task. We proposed an HRL method that learns a latent variable of a hierarchical policy using mutual information maximization. To learn option policies that correspond to modes of the advantage function, we introduced advantage-weighted importance sampling.
T. Osa, V. Tangkaratt, and M. Sugiyama. Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization,
International Conference on Learning Representation (ICLR), 2019, to appear
[ arXiv ]
We developed a motion planning framework that combines the advantages of optimization-based and demonstration-based methods. A distribution of trajectories demonstrated by human experts is used to guide the trajectory optimization process in our framework. The resulting trajectory maintains the demonstrated behaviors,which are essential to performing the task successfully, while adapting the trajectory to avoid obstacles. In simulated experiments and with a real robotic system, we verify that our approach optimizes the trajectory to avoid obstacles and encodes the demonstrated behavior in the resulting trajectory.
T. Osa, A. M. Ghalamzan, E., R. Stolkin, R. Lioutikov, J. Peters, and G. Neumann. Guiding Trajectory Optimization by Demonstrated Distributions, IEEE Robotics and Automation Letters (RA-L), Vol.2, No.2, pages 819-826, 2017.
[ paper ]
We developed a framework for hierarchical reinforcement learning of grasping policies. In our framework, the lower-level hierarchy learns multiple grasp types, and the upper-level hierarchy learns a policy to select from the learned grasp types according to a point cloud of a new object. Through experiments, we validate that our approach learns grasping by constructing the grasp dataset autonomously. The experimental results show that our approach learns multiple grasping policies and generalizes the learned grasps by using local point cloud information.
T. Osa, J. Peters, G. Neumann. Experiments with Hierarchical Reinforcement Learning of Multiple Grasping Policies, Proceedings of the International Symposium on Experimental Robotics (ISER), 2016.
[ paper ]
This study presents a framework of online trajectory planning and force control by learning from demonstrations. By leveraging demonstration under various conditions, we can model the conditional distribution of the trajectories given the task condition. This scheme enables generalization of the trajectories of spatial motion and contact force to new conditions in real time. In addition, we propose a force tracking controller that robustly and stably tracks the planned trajectory of the contact force by learning the spatial motion and contact force simultaneously.
T. Osa, N. Sugita, and M. Mitsuishi, Online Trajectory Planning and Force Control for Automation of Surgical Tasks, IEEE Transactions on Automation Science and Engineering, 2017
[ paper ]