Virtual humanoids learning motion skills

Proposé par: Olivier Sigaud
Directeur de thèse: Olivier Sigaud
Unité de recherche: UMR 7222 Institut des Systèmes Intelligents et de Robotique

Domaine: Sciences et technologies de l'information et de la communication


Human motion is of very complex nature. It requires many abilities developed during a long learning process, such as advanced balance, good coordination, the ability to exploit inertial effects, to rapidly plan new contacts and to control precisely interaction forces. In a nutshell, the objective of this PhD thesis is to design specific learning algorithms that enable virtual humanoids to autonomously construct repertoires of motion skills in the aim of acquiring these abilities. We will explore machine learning for humanoid motion as a specific field. This is very different from existing research works that develop either optimization-based methods relying on machine learning only for parameters adjustment, or generic learning frameworks for which it is difficult to cope with the specific complexities of humanoid motion. The global objective is to design unsupervised and reinforcement learning algorithms that can acquire motion skills and mobilize them adequately. Eventually, the virtual humanoid should be able to generate complex motions from simple inputs. For instance, defining a few waypoints for the head trajectory should lead to the automatic generation of walking motions and jumps. The research will be organized around three axes corresponding to three important ingredients of robot motion : motion features and state representations, motion primitives, and skills sequencing and abstraction.

1. Motion features and state representations.

A large number of efficient algorithms for bipedal walking and balance are based on simplified models of the dynamics and on the control of meaningful quantities such as the Zero Moment Point [1], the Capture Point [2], or the centroidal angular momentum [3] to cite just a few. This PhD thesis will explore learning algorithms that take as inputs not only raw joint angles, velocities and torques, but also well-selected vectors of physically meaningful measurements. This will raise several challenges, such as the evaluation of the usefulness of these redundant inputs, or the construction of new relevant features based on physical principles. Additionally, efforts will be made to extend to humanoid motion a recent work that proposed to learn appropriate state representations based on priors that are specific to robotics [4] (for instance the proportionality between control inputs and the rate of change of some features).

2. Motion primitives.

It has been demonstrated in practice that convolutional neural networks (CNNs) are very useful for image processing [5]. Convolutions are used as a generic and flexible operation that realizes meaningful dimensionality reductions by exploiting the links between proximity and information redundancy in pixel grids. But the sensory input used in the control of virtual or physical humanoid robots is very different from images, especially in the context of this PhD thesis in which visual input will never be considered (in simulation, the robot posture and configuration is always known). Furthermore, the (control) output is much richer than in classical applications of reinforcement learning. This PhD proposes to search for types of computations that, similarly to CNNs for image processing, will improve the efficiency of learning algorithms for humanoid motion control. Several types of computations will be considered, all relying on specificities of humanoids, like their underactuation, redundancy and hierarchical kinematics. Considering relevant functions should help making the learning for controller design more tractable. Interesting results have been obtained with the classical framework of dynamic movement primitives [6], but the use of fixed parametrized structures for the controllers decreases the flexibility and expressiveness of the skills that can be learned. To avoid this caveat and keep a low complexity, we will build upon a recent work [7] that introduces a diffeomorphic matching algorithm to deform vector fields. This offers a flexible way to incrementally modify time-invariant controllers, without losing key topological properties such as asymptotic stability. Another research direction that will be considered concerns the incorporation into learning processes of two very successful approaches for humanoid robot trajectory generation : motion planning and optimization (see [8]).

3. Skills sequencing and abstraction.

This axis is about the orchestration of all the learning tasks and the organization of the end-to-end framework that defines and trains new skills, and creates a repertoire from which the skills can be appropriately leveraged by the humanoid depending on the situation and on the task to achieve. Following what humans seem to do, expert methods for humanoid motion generation are usually based on several layers of algorithms connecting low-level controllers to higher-level decision modules. These global approaches can be called ``tiered strategies’’. The PhD candidate will design algorithms for skills discovery and sequencing based on such tiered strategies, following for instance the options framework [9] (which adds to reinforcement learning principled methods for planning and learning using high-level skills [10]) with specific hierarchies of skills suitable for humanoid motion control.


Significant progress has been made in machine learning in the past few years, with deep learning techniques achieving results beyond the state-of-the-art for many applications. Combining the need for a great variety of skills, high-dimensionality in input and output, and complicated problems related to balance, contacts, redundancy and under-actuation, humanoid robotics is a perfect example of application that can benefit highly from machine learning. Nevertheless, it seems too complex to be solved by a direct application of general learning techniques. By focusing on the specificities of humanoid robotics and on mixing learning with more classical model-based control methods, we hope to make the generation of complex and physically realistic humanoid motions much faster and simpler than with the current state-of-the-art methods.

As far as concrete outcomes are considered, having virtual characters that autonomously learn and organize motion skills would ease the edition of physically realistic humanoid motion, which in turn could lead to applications in the video game and computer animation industries, as well as in ergonomics for the manufacturing industry. Furthermore, attention will be put on transferability of the skills to real humanoid platforms.

Remarques additionnelles

L’encadrant effectif sera Nicolas Perrin

Références bibliographiques :

[1] S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada, K. Yokoi, H. Hirukawa, "Biped walking pattern generation by using preview control of zero-moment point", IEEE International Conference on Robotics and Automation (ICRA), pp. 1620-1626, 2003.

[2] J. Pratt, J. Carff, S. Drakunov, A. Goswami, "Capture point : A step toward humanoid push recovery", IEEE-RAS International Conference on Humanoid Robots (Humanoids), pp. 200-207, 2006.

[3] A. Goswami, V. Kallem, "Rate of change of angular momentum and balance maintenance of biped robots", IEEE International Conference on Robotics and Automation (ICRA), pp. 3785-3790, 2004.

[4] R. Jonschkowski, O. Brock, "State Representation Learning in Robotics : Using Prior Knowledge about Physical Interaction", Robotics : Science and Systems (RSS), 2014.

[5] A. Krizhevsky, I. Sutskever, G. E. Hinton, "Imagenet classification with deep convolutional neural networks", Advances in Neural Information Processing Systems (NIPS), pp. 1097-1105, 2012.

[6] A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, S. Schaal, "Dynamical movement primitives : learning attractor models for motor behaviors", Neural computation, 25(2):328–373.

[7] N. Perrin, Ph. Schlehuber-Caissier, "Fast diffeomorphic matching to learn globally asymptotically stable nonlinear dynamical systems", Systems & Control Letters, 96:51–59, 2016.

[8] A. Escande, N. Mansard, P.-B. Wieber, "Hierarchical quadratic programming : Fast online humanoid-robot motion generation", The International Journal of Robotics Research (IJRR), 33(7):1006–1028, 2014.

[9] R. S. Sutton, D. Precup, S. Singh, "Between MDPs and semi-MDPs : A framework for temporal abstraction in reinforcement learning", Artificial intelligence, 112(1-2):181–211, 1999.

[10] G. Konidaris, A. G. Barto, "Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining", Advances in Neural Information Processing Systems (NIPS), pp. 1015–1023, 2009.

Se connecter

Attention! Moteur de recherche efficace!
EDITE de Paris | SPIP | Remarques | Se connecter | Plan du site | Suivre la vie du site Atom 1.0 | | | Facebook | Twitter | LinkedIn