Speaker recognition consists of determining who spoke when, where the identity can be that of the true speaker or an identity specific to one document or a set of documents. Different sources of information can be used to identify the speaker in multimedia documents (the speaker's voice, what is said, or what is written. The group is leading the QCOMPERE consortium for the REPERE challenge.
Affective and social dimension detection are being applied to both human-machine interaction with robots and in the analysis of audiovisual and audio documents such as call center data. The main research subjects in this area are emotion and social cues identification in human-robot interaction, emotion detection based on verbal and non verbal cues (acoustic, visual and multimodal), dynamic user profile (emotional and interactional dimensions) in dialog for assistive robotics, and multimodal detection of the anxiety applied to therapeutic serious games.
The very large corpora used for training statistical models are exploited for linguistic studies of spoken language, such as acoustic-phonetics, pronunciation variation and diacronic evolution. Automatic alignment enables studies on hundreds to thousands of hours of data, permitting the validation of hypotheses and models. This topic also studies human and machine transcription errors via perception experiments.
Robust analysis methods for the spoken language are developed in the framework of open domain information retrieval with applications to language understanding for dialog systems, to named-entity recognition, and to interactive question answering systems supporting both spoken and written languages.
Research activities on statistical machine translation of speech or text focus on the design and development of novel language and translation models as well as novel decoding strategies; this activity is closely related to the development of machine learning methodologies for multilingual Natural Language Processing applications.
Speech recognition is the process of transcribing the speech signal into text. Depending upon the targeted use, the transcription can be completed with punctuation, with paralinguistic information such as hesitations, laughter or breath noises. Research on speech recognition relies on supporting research in acoustic-phonetic modeling, lexical modeling and language modeling (a problem also addressed for machine translation), which are undertaken in a multilingual context (18 languages). This topic also includes research on language recognition, that is determining the language and/or dialect of an audio document for both wideband and telephone band speech.
In addition to the collection, annotation and sharing of varied corpora, this research topic addresses more general investigations on Language Resources, covering data, tools, evaluation and meta-resources (guidelines, methodologies, metadata, best Practice), for spoken and written language, but also for multilingual, multimodal, and mutimedia data. Those activities are mostly conducted in collaboration with national and international organizations and networks.