LIMSI logo
Search 
 
    The CNRS LIMSI Directory
   
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur
 

Spoken Language Processing Group (TLP)

Analysis, synthesis and perception


This theme deals with speech and voice analysis, synthesis and perception. Research in speech synthesis focuses on the French and the Spanish languages. Voice analysis research is carried out on speech and on singing voice, with potential application to voice pathology acoustics. Speech and voice are seen in a multi-disciplinary framework: electrical engineering (computer science, signal processing and acoustics), linguistics (phonetics, phonology and syntax) and psychology (psychoacoustics and cognition). Analysis, synthesis and perception are intimately tied. For instance, the stimuli used for perception research are produced with the help of synthesis; on the other hand perceptual results are used to constrain synthesis models. Voice source analysis is another example of the interplay between analysis and synthesis: on the one hand better source analysis is needed for synthesis, and on the other hand synthesis is a tool for testing voice source models.

Analysis

Intra-speaker and inter-speaker voice quality analysis is central in our speech and voice analysis research. The main topics studied are: voice source analysis, vocal effort analysis, recording of acoustic databases, comparison of acoustic and physiological signals. Voice quality depends mainly on the voice source. A theoretical investigation of glottal flow models was conducted and the correspondance between time-domain models and analytic spectra has been worked out for the main glottal flow models. With the help of this correspondance, new algorithms for the open quotient estimation were developed. We compared spectral estimation of the open quotient with electroglottographic signals. Voice source analysis using acoustic and physiological signals is also being studied. Vocal effort is an important source of variability in speech. Variations in fundamental frequency, duration, and glottal waveform parameters have been studied for various levels of vocal effort. This study has made use of the CORENC database which contains multi-speaker isolated vowels recorded with different vocal efforts. Vocal effort was also studied using analysis/synthesis techniques based on our results on glottal flow analysis. A bibliography on time-frequency methods for speech signal analysis continues to be maintained and extended.

Speech Synthesis

Efforts on Text-To-Speech (TTS) synthesis concern several main aspects. Linguistic procedures for French TTS were studied in-depth as part of the doctoral thesis of Ph. Boula de Mareüil, defended in December 1997. The first part of this thesis work adressed grapheme-to-phoneme conversion. The conversion rules have been improved, particularly for difficult cases, such as acronyms, proper names and abbreviations. A fast and robust syntactic parser suited to the needs of TTS was developed. This deterministic parser is based on a chunk grammar for chunking texts into syntagms corresponding to prosodic words. The parser has been tested and integrated in the French TTS system. Much effort has been devoted to TTS evaluation in the framework of the Aupelf-B3 project. This project gathers participants from 9 laboratories from French speaking countries. In a common evaluation on grapheme-to-phoneme conversion for TTS in French, the LIMSI system obtained the best result among the 8 systems tested. This year a real-time full TTS in Castillan Spanish (Madrid dialectq) was developed using the same architecture as the French system.

Perception

A series of experiments have been conducted in order to study the perceptual space of vocal timbre at the syllable level. A database was recorded containing repetitions of the same sentence produced by a wide range of speakers (from 5 to 75 years old), and with a wide range of speaking styles. Preliminary results indicate that intra-speaker variability might be perceptually more salient than inter-speaker variability. A study of pitch perception for fundamental frequency glissandos has been completed. A weighted time-average model has been proposed. In the continuation of this work, experiments are being conducted on the interaction between periodicity pitch and spectral pitch in speech perception. Concerning our work on pattern processing, the inductive inference process has been successfully applied to sets of thousands of examples, and adapted to noisy data. A new method for signal coding in cochlear implants is being tested, in collaboration with the Phoniatric Dept. at the Hôpital Saint Antoine (Paris). The most disturbing noises for cochlear implants holders have been identified.

Intonation

Correct intonation is essential for obtaining synthetic speech of acceptable quality. There are several difficulties which need to be overcome. Firstly, intonation is governed by multiple, poorly defined factors, both linguistic and non-linguistic in nature. Secondly, the perception of intonation is not well understood. We have addressed the latter problem by carrying out a series of psychoacoustical experiments which studied in detail the perception of pitch glissandi. A perceptual model of intonation was implemented based on an automatic analysis of tone. Given the signal and its phonetic labelling, we automatically derive a series of syllabic tones perceptually equivalent to the melodic curve of the signal. This approach allows us to assess the validity of the perceptual model for intonation, as well as the automatic stylization of the melodic curves for speech synthesis. Mid-way between speech recognition and analysis, we have investigated the extraction of robust intonative parameters, and automatic recognition of intonation. The aim was to identify the intonative structures found in a spontaneous speech corpora containing continuous speech from several speakers. A statistical intonation recognition system was implemented. It was necessary to introduce a new definition of the relevant intonative parameters, in order to to be able to robustly determine the intonation patterns in the multispeaker continuous speech corpus.
  • Some publications on speech synthesis
  • Some publications on intonation
  • Some publications on perception
  • Sound examples
    Activities - Themes - Projects - Publications - People
    ©LIMSI-CNRS, Orsay-France, 1997-2009
    Last modified: Sunday,11-December-05 06:13:34 CET