Robots constructing representations from experience

Proposé par: Stéphane Doncieux
Directeur de thèse: Stéphane Doncieux
Unité de recherche: UMR 7222 Institut des Systèmes Intelligents et de Robotique

Domaine: Sciences et technologies de l'information et de la communication


Designing algorithms that provide robots or agents with autonomy is often a complicated and tedious endeavor. To make it easier, many learning algorithms exist that can optimize over time the parameters of the mapping from perceptions to actions, with the objective to increase some measure of the quality of the behavior, like for instance the ability to collect a reward.

However, the mapping from perceptions to actions has most of the time a fixed structure, with some internal representations that are not modified during learning. The success of the learning process depends largely on these representations, and it would be desirable for learning algorithms to have the ability to infer appropriate representations directy from experience.

The goal of the proposed PhD is to explore different research directions for the automated construction of representations via an a posteriori analysis of sensori-motor traces.


The idea is to start with a classical learning algorithm with fixed representations, and, based on the results obtained, extract interesting features from the sensori-motor recordings (the traces), and construct new representations that seem appropriate for the tasks the robot is trying to perform. The usefulness of the proposed representations will then be quantitatively assessed, following two methods that consist in answering the following questions :

  • Do the new representations yield a speed-up when trying to relearn from scratch ?
  • Do the new representations yield better transferability, when, after achieving the learning process, the robot is suddenly put in a new environment ?

Two main research axes will be considered :

1. Skill extraction, organization and refining.

Skills are perception-action feedback loops. Instead of learning a unique unconstrained map from perceptions to actions, it is in general easier to use a finite set of skills and learn how to make appropriate switches.

Humans and animals tend to train isolated skills and then compose with them to generate rich arrays of movements. Skills help to articulate low-level reactive behaviors and higher-level decisions.

Our specific objective will be to design skills by analyzing traces of behaviors generated via a first learning phase with ``naive’’ representations and no skills.

A large body of relevant work exists on skill chaining and on the related concept of options [1, 2, 3], which have also ties with subgoal extraction methods [4, 5]. However, many questions are still open, especially in the case of continuous sensory inputs and only partially observable states. For instance :

  • What are the most adequate ways to represent skills ?
  • Is it useful differentiate several categories of skills ?
  • How to validate the relevance of a given skill with respect to some well-defined task ?
  • How to perform high-level reasoning to orchestrate the composition of skills ?
  • For efficient skill construction, how to cope with the problems of delayed reward and perceptual aliasing ?

Such questions will be central in the proposed PhD thesis.

2. Definition of internal states.

This axis is complementary to the first one, and aims at defining internal representations, or internal states, that depend on the perceptions and possibly on the memory.

The role of such internal states is to facilitate the selection of actions, often via an explicit or implicit dimensionality reduction. Indeed, these internal states should provide a compressed representation of the information that can be inferred from the sensory inputs. They should also be robust to noise (thus leading to more robust behaviors), and yield a better generalization than with actions expressed as functions of the raw input data. When these internal states are discrete, they can allow symbolic reasoning, and therefore their construction is closely related to the symbol grounding problem [6].

Again, several open questions will be examined :

  • How to decide the amount of memory that should be used in internal states specifically built for a given task ?
  • How to dynamically modify internal representations ?
  • How to store and decide the reactivation of former internal representations ?
  • How to take rewards into account when clustering the perception space ?
  • How can relevant discrete internal states be constructed from a completely continuous experience ?

The scientific literature on these questions contains a large range of potential answers, usually without consensus, and many tools exist that consider the problems from different angles. Various approaches and various data mining and machine learning algorithms will be considered to solve the aforementioned questions (e.g. reinforcement learning & evolutionary algorithms, random forests, self-organizing maps, etc.). One tool that will be of particular interest is unsupervised deep learning. It has been shown that deep networks (e.g. Deep Beliefs Networks [7] or Stacked Autoencoders [8]) have the ability to create neurons, in the hidden layers, that fire upon detection of somewhat high-level, abstract features of the input. Such neurons could be interesting building blocks for the construction of abstract internal states, but as of now there exist no method to do it directly. Some progress is required, for instance on the assessment of the ``abstractness’’ of a neuron, or the characterization of the usefulness of a neuron with respect to a given task.

The PhD will be done in the context of the H2020-FETPROACT collaborative project DREAM [9]. A large effort will be put on the implementation of the ideas, with systematic validation on robotics experiments or simulations that have been precisely defined by the DREAM consortium.

Références :

  • [1] Sutton, R. S., Precup, D. & Singh, S. (1998). Between MDPs and Semi-MDPs : Learning, Planning, and Representing Knowledge at Multiple Temporal Scales. Technical report, University of Massachusetts.
  • [2] Konidaris, G., Kuindersma, S., Grupen, R. & Barto, A. (2012). Robot learning from demonstration by constructing skill trees. The International Journal of Robotics Research, 31(3), pp. 360-375.
  • [3] Stulp, F., Herlant, L., Hoarau, A. & Raiola, G. (2014). Simultaneous On-line Discovery and Improvement of Robotic Skill Options. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’14).
  • [4] McGovern, A. \& Barto, A. (2001). Automatic discovery of subgoals in reinforcement learning using diverse density. Proceedings of the 18th International Conference on Machine Learning (ICML’01).
  • [5] Şimşek, Ö., Wolfe, A. P., \& Barto, A. (2005). Identifying useful subgoals in reinforcement learning by local graph partitioning. Proceedings of the 22nd International Conference on Machine learning (ICML’05)
  • [6] Harnad, S. (1990). The symbol grounding problem. Physica D : Nonlinear Phenomena, 42(1), pp. 335-346.
  • [7] Hinton, G. E., Osindero, S., \& Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), pp. 1527-1554.
  • [8] Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y. \& Manzagol, P. A. (2010). Stacked denoising autoencoders : Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research, 11, pp. 3371-3408.
  • [9] http://www.robotsthatdream.eu/

Ouverture à l'international

The PhD will be done in the context of the H2020-FETPROACT collaborative project DREAM and will thus involve collaborations with its members.

Se connecter

Moteur de recherche de l'EDITE
EDITE de Paris | SPIP | Remarques | Se connecter | Plan du site | Suivre la vie du site Atom 1.0 | | | Facebook | Twitter | LinkedIn