Confidence Measures for Alignment and for Machine Translation

Thèse de Yong XU, soutenue le 26 septembre au LIMSI


In computational linguistics, the relation between different languages is often studied through automatic alignment techniques. Such alignments can be established at various structural levels. In particular, sentential and sub-sentential bitext alignments constitute an important source of information in various modern Natural Language Processing (NLP) applications, a prominent one being Machine Translation (MT). Effectively computing bitext alignments, however, can be a challenging task. Discrepancies between languages appear in various ways, from discourse structures to morphological constructions. Automatic alignments would, at least in most cases, contain noise harmful for the performance of application systems which use the alignments. To deal with this situation, two research directions emerge: the first is to keep improving alignment techniques; the second is to develop reliable confidence measures which enable application systems to selectively employ the alignments according to their needs. Both alignment techniques and confidence estimation can benefit from manual alignments. Manual alignments can be used as both supervision examples to train scoring models and as evaluation materials. The creation of such data is, however, an important question in itself, particularly at sub-sentential levels, where cross-lingual correspondences can be only implicit and difficult to capture. This thesis focuses on means to acquire useful sentential and sub-sentential bitext alignments.

The contributions have been applied to a real world application: the development of a bilingual reading tool aimed at facilitating the reading in a foreign language.


Bitext alignment, Confidence Measures, Machine Translation


M. Philippe LANGLAIS Université de Montréal Rapporteur
M. Olivier KRAIF Université Grenoble Alpes Rapporteur
M. Pierre ZWEIGENBAUM LIMSI Examinateur
M. Yannick ESTEVE Université du Maine Examinateur
M. Stéphane HUET Université d’Avignon et des Pays de Vaucluse Examinateur
M. François YVON Université Paris-Sud Directeur de thèse