TLP Group Research Topics

Speaker characterization in a multimodal context

Speaker recognition consists of determining who spoke when, where the identity can be that of the true speaker or an identity specific to one document or a set of documents. Different sources of information can be used to identify the speaker in multimedia documents (the speaker's voice, what is said, or what is written. The group is leading the QCOMPERE consortium for the REPERE challenge.

Affective and social dimensions of spoken interactions

Affective and social dimension detection are being applied to both human-machine interaction with robots and in the analysis of audiovisual and audio documents such as call center data. The main research subjects in this area are emotion and social cues identification in human-robot interaction, emotion detection based on verbal and non verbal cues (acoustic, visual and multimodal), dynamic user profile (emotional and interactional dimensions) in dialog for assistive robotics, and multimodal detection of the anxiety applied to therapeutic serious games.

Perception and automatic processing of variation in speech

The very large corpora used for training statistical models are exploited for linguistic studies of spoken language, such as acoustic-phonetics, pronunciation variation and diacronic evolution. Automatic alignment enables studies on hundreds to thousands of hours of data, permitting the validation of hypotheses and models. This topic also studies human and machine transcription errors via perception experiments.

Automatic translation and machine learning

Research activities on statistical machine translation of speech or text focus on the design and development of novel language and translation models as well as novel decoding strategies; this activity is closely related to the development of machine learning methodologies for multilingual Natural Language Processing applications.

Speech recognition

Speech recognition is the process of transcribing the speech signal into text. Depending upon the targeted use, the transcription can be completed with punctuation, with paralinguistic information such as hesitations, laughter or breath noises. Research on speech recognition relies on supporting research in acoustic-phonetic modeling, lexical modeling and language modeling (a problem also addressed for machine translation), which are undertaken in a multilingual context (18 languages). This topic also includes research on language recognition, that is determining the language and/or dialect of an audio document for both wideband and telephone band speech.

Language resources

In addition to the collection, annotation and sharing of varied corpora, this research topic addresses more general investigations on Language Resources, covering data, tools, evaluation and meta-resources (guidelines, methodologies, metadata, best Practice), for spoken and written language, but also for multilingual, multimodal, and mutimedia data. Those activities are mostly conducted in collaboration with national and international organizations and networks.

Campus universitaire bât 507
Rue du Belvédère
F - 91405 Orsay cedex
Tél +33 (0) 1 69 15 80 15


Scientific report

LIMSI in numbers

8 Research Teams
100 Researchers
40 Technicians and Engineers
60 Doctoral Students
70 Trainees


Paris-Saclay University new window

Logo DataIA