Incremental social behavior planner

Proposé par: Catherine PELACHAUD
Directeur de thèse: Catherine PELACHAUD
Directeur de thèse: Catherine PELACHAUD
Unité de recherche: UMR 7222 Institut des Systèmes Intelligents et de Robotique

Domaine: Sciences et technologies de l'information et de la communication


During an interaction we communicate not only through speech but also through a wide range of social signals. We smile at each other ; we punctuate our speech with laughter ; we mimic emotional expressions. These signals provide information on our emotional state, our stance on what we say, or relationship with our interlocutor… The same utterance may be perceived very differently depending on how it is said and which nonverbal behaviors accompany it. In particular, smile and laughter may have many functions in a conversation. A laugh may be used at the end of a sentence to mask embarrassment or to mean irony. Within an utterance it can also signal not to take too seriously what has just been said. Laughter can also have social functions such as indicating group belonging.

Embodied Conversational Agents (ECAs) are virtual entities with human-like appearance. They also communicate verbally and nonverbally. They are used as interface in human-machine interaction taking several roles, such as assistant, tutor, or companion. They are endowed with communicative capability, that is, they can dialog with humans using verbal and nonverbal means. In this PhD we focus on modeling ECAs able to display social signals to modulate what it says, that is we consider the pragmatic function of the social signals. We will consider several signals but will pay particular attention to smile and laughter. Several models have been proposed to simulate socio-emotional behaviors for ECAs in interaction. Two major issues arise regarding the type of behaviors to be displayed and when it should be shown. Regarding the modeling of multimodal behaviors associated to social attitudes and emotions, several approaches rely on different theories from social sciences literature and on the correlation between specific behavior patterns and the expression of attitudes and emotion (Malatesta et al. 2009). Bickmore and Picard (2005) have incorporated findings from the literature on psychology and social sciences (Argyle, 1998) to specify the behavior of their relational agent Laura. To take into account that the perception of a behavior (eg a smile) may vary depending on the context of its display (a smile followed by a gaze shift conveys a different attitude than a smile followed by a leaning toward one’s interlocutor, Dermouch and Pelachaud (2016) proposed to model the expression of an attitude as a sequence of behaviors using sequence mining techniques.

Several models have been proposed to simulate multimodal behaviors in ECAs. Ding et al. (2014) developed a machine learning approach to laughter synthesis, specifically to what they called hilarious laughter, laughter triggered from amusing and positive stimuli such as jokes. They created a generator for face and body motions that takes as input sequences of pseudo-phonemes of laughter derived from work by Urbain et al. (2013) and each pseudo-phoneme’s duration time. This is integrated as a module in a virtual agent triggered to be used following the telling of a joke by the agent or by the user (Ochs&Pelachaud, 2012). Using crowd sourcing techniques, Ravenet et al (2013) created three basic types of smiles for a virtual agent : amused, polite, and embarrassed. Each type is associated with several possible communicative intentions, for instance polite smiles are associated with an intention to communicate encouragement or to communicate that an utterance has been understood.

Most of existing models that compute which behaviors an ECA should display rely on emotion and attitude models. That is on models that compute the emotional state the agent is in and which attitude it aims to show. These models have focused on the type of emotions and attitudes to show but not on when to display it in an interaction. That is, these models have not addressed when the agent should laugh, just after saying its joke, at the end of its speaking turn, while it speaks… Moreover existing works did not study the impact the placement of an emotional signal has on the perception of what is being said.

In this PhD we will address such issues. We aim to develop an ECA able to display social signals. We will focus mainly on smiles and laughter. We will consider not only the emotional state of the agent but also the dialogic and pragmatic functions of behaviors.

We foresee the following steps :

-  Enlarge the repertoire of nonverbal behaviors of the ECAs. Particular attention will be paid on simulating a great variety of laughter. We will apply machine learning techniques to capture the correlation between acoustic features of laughter and multimodal behaviors. We will ensure the model conveys the dynamics of the laughter motion. We will rely on motion capture data.

-  Develop an incremental behavior model allowing the agent to update its behaviors on the fly. The behavior planner of the ECA will be able to change incrementally the multimodal behaviors of the agent by either adding the new behavior to the current animation or by suppressing the latter, or by blending the former with the latter. This task requires modifying the virtual agent framework that actually computes the behavior of the agent at the sentence level.

-  Integrate the incremental behavior model with a dialog manager that computes what the agent says. The dialog manager will output communicative intentions that are turned into multimodal behaviors with the behavior planner module. It will also output pragmatic functions that will be carried out by specific signals (eg smile, laughter) that will need to be added onto the signals to be displayed by the agent.

-  Evaluate the impact of social signals on the perceived meaning of what the agent says and the naturalness of the virtual agent’s behavior.


A major challenge will be to develop incremental behavior model. So far most of existing models compute the agent’s behavior ahead of time, at the sentence level. As such these models cannot handle easily the production of new signals and re-planning of behaviors. The pragmatic functions of signals are not embedded in the behavior model of virtual agent. Another challenge lies in the generation of various behavior types such as smiles and laughter.

Ouverture à l'international

This work will be conducted in close relation with Jonathan Ginzburg, University of Paris7-Diderot, that will serve as referent for the dialog model and the pragmatic formulation of the social signal. The data used in this PhD will come from the DUEL project that is coordinated by Jonathan Ginzburg. We will also collaborate with David Schlangen, University of Bielefeld, Germany, for the incremental model of the dialog. The PhD student will actively participate to the work conducted by Ginzburg and Schlangen on studying social signals within a follow up to the DUEL project.

Remarques additionnelles

• Argyle, M. : Bodily Communication. University paperbacks, Methuen (1988)

• Bickmore, T. and Picard, R. (2005) "Establishing and Maintaining Long-Term Human-Computer Relationships" ACM Transactions on Computer Human Interaction (ToCHI), 59(1) : 21-30.

• Dermouche S., Pelachaud C. : Sequence-based multimodal behavior modeling for social agents. ICMI 2016 : 29-36

• Yu Ding, Ken Prepin, Jing Huang, Catherine Pelachaud, Thierry Artières : Laughter animation synthesis. AAMAS 2014 : 773-780

• Malatesta, L., A. Raouzaiou, A.,K. Karpouzis, K., &and S.D. Kollias, S. D. (2009). Towards modeling embodied conversational agent character profiles using appraisal theory predictions in expression synthesis. Applied Intelligence, 30(1) :58–64

• M. Ochs and C. Pelachaud. Model of the perception of smiling virtual character. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, AAMAS ’12.

• Brian Ravenet, Magalie Ochs, Catherine Pelachaud : From a User-created Corpus of Virtual Agent’s Non-verbal Behavior to a Computational Model of Interpersonal Attitudes. IVA 2013 : 263-274

• J. Urbain, H. Cakmak, and T. Dutoit. Automatic phonetic transcription of laughter and its application to laughter synthesis. In Proceedings of the Fifth biannual Humaine Association Conference on Affective Computing and Intelligent Interaction, pages 153-158, 2013.

Se connecter

Attention! Moteur de recherche efficace!
EDITE de Paris | SPIP | Remarques | Se connecter | Plan du site | Suivre la vie du site Atom 1.0 | | | Facebook | Twitter | LinkedIn