Existing motion planning methods often have two drawbacks: 1) goal configurations need to be specified by a user, and 2) only a single solution is generated under a given condition. However, it is not trivial for a user to specify the optimal goal configuration in practice. In addition, the objective function used in the trajectory optimization is often non-convex, and it can have multiple solutions that achieve comparable costs. In this study, we propose a framework that determines multiple trajectories that correspond to the different modes of the cost function.
T. Osa. Multimodal Trajecotry Optimization,
The International Journal of Robotics Research (IJRR), accepted.
[ arXiv ]
Real-world tasks are often highly structured. Hierarchical reinforcement learning (HRL) has attracted research interest as an approach for leveraging the hierarchical structure of a given task in reinforcement learning (RL). However, identifying the hierarchical policy structure that enhances the performance of RL is a challenging task. We proposed an HRL method that learns a latent variable of a hierarchical policy using mutual information maximization.
T. Osa, V. Tangkaratt, and M. Sugiyama. Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization,
International Conference on Learning Representation (ICLR), 2019.
[ arXiv ]
We developed a motion planning framework that combines the advantages of optimization-based and demonstration-based methods. A distribution of trajectories demonstrated by human experts is used to guide the trajectory optimization process in our framework. The resulting trajectory maintains the demonstrated behaviors,which are essential to performing the task successfully, while adapting the trajectory to avoid obstacles.
T. Osa, A. M. Ghalamzan, E., R. Stolkin, R. Lioutikov, J. Peters, and G. Neumann. Guiding Trajectory Optimization by Demonstrated Distributions, IEEE Robotics and Automation Letters (RA-L), Vol.2, No.2, pages 819-826, 2017.
[ paper ]
We developed a framework for hierarchical reinforcement learning of grasping policies. In our framework, the lower-level hierarchy learns multiple grasp types, and the upper-level hierarchy learns a policy to select from the learned grasp types according to a point cloud of a new object. Through experiments, we validate that our approach learns grasping by constructing the grasp dataset autonomously. The experimental results show that our approach learns multiple grasping policies and generalizes the learned grasps by using local point cloud information.
T. Osa, J. Peters, G. Neumann. Experiments with Hierarchical Reinforcement Learning of Multiple Grasping Policies, Proceedings of the International Symposium on Experimental Robotics (ISER), 2016.
[ paper ]
This study presents a framework of online trajectory planning and force control by learning from demonstrations. By leveraging demonstration under various conditions, we can model the conditional distribution of the trajectories given the task condition. This scheme enables generalization of the trajectories of spatial motion and contact force to new conditions in real time. In addition, we propose a force tracking controller that robustly and stably tracks the planned trajectory of the contact force by learning the spatial motion and contact force simultaneously.
T. Osa, N. Sugita, and M. Mitsuishi, Online Trajectory Planning and Force Control for Automation of Surgical Tasks, IEEE Transactions on Automation Science and Engineering, 2017
[ paper ]