TLP Group - Presentation
The Spoken Language Processing group carries out research aimed at
understanding the human speech communication processes and developing models
for use in automatic processing of speech. This research is by nature
interdisciplinary, drawing upon expertise in signal processing,
acoustic-phonetics, phonology, semantics, statistics and computer science. The
group's research activities are validated by developing systems for automatic
processing of spoken language such as speech recognition, language
identification, multimodal characterization of speakers and their affective
state, named-entity extraction and question- answering, spoken dialog,
multimodal indexation of audio and video documents, and machine translation of
both spoken and written language.
With the aim of extracting and structuring information in audio documents, the
group develops models and algorithms that use diverse sources of information
to carry out a global decoding of the signal, that can be applied to identify
the speaker, the language being spoken if it is not known a priori, the
affect, to transcribe the speech or translate it, or
identify specific entities.
Speech recognition is the process of transcribing the speech signal
into text. Depending upon the targeted use, the transcription can be completed
with punctuation, with paralinguistic information such as hesitations,
laughter or breath noises. Research on speech recognition relies on supporting
research in acoustic-phonetic modeling, lexical modeling and language modeling
(a problem also addressed for machine translation), which are undertaken in a
multilingual context (18 languages).
Statistical machine translation is an intensive area of research for
the group today with the development of novel language and translation models
as well as novel decoding strategies. This research area is closely related to
the development of machine learning tools with two major achievements: the
Wapita open source software for linear chain CRFs, and the development of new
tools for neural network language model training.
Affective and social dimension detection are being applied to both
human-machine interaction with robots and in the analysis of audiovisual
documents such as call center data. The main research subjects in this area are
speaker and emotion identification in human-robot interaction, emotion
detection in client/agent interaction, emotion detection based on acoustic,
visual and physiological cues for assistive robotics, and ,ultimodal detection
of the anxiety applied to therapeutic serious games.
Robust analysis methods for the spoken language are being developed in
the framework of open domain information retrieval with applications to
language understanding for dialog systems, to named- entity recognition, and
to interactive question answering systems supporting both spoken and written
languages.
As of December 2011, the group has 43 members -- 12 permanent CNRS, 6 research
associates, 11 postdocs, 2 contractual research staff, and 12 doctoral
students. In addition to its research activities, the group is responsible
for several graduate level speech processing courses, principally at the
University of Paris-Sud. In 2010 and 2011 the members of the group published
147 articles (21 in journals, 28 chapters in books, and 97 reviewed conference
papers).
Last modified: Saturday,01-September-12 17:40:32 CEST