Matthieu Labeau, LIMSI, doctorant TLP

28 novembre 2017 à 11h30 :

Character and Subword-Based word Representation for Neural Language Modelling prediction

Most of neural language models use different kinds of embeddings for word prediction. While word embeddings can be associated to each word in the vocabulary or derived from characters as well as factored morphological decomposition, these word representations are mainly used to parametrize the input, i.e. the context of prediction. This work investigates the effect of using subword units (character and factored morphological decomposition) to build output representations for neural language modeling. We present a case study on Czech, a morphologically-rich language, experimenting with different input and output representations. Our experiments show that augmenting the output word representations with character-based embeddings can significantly improve the performance of the model. This work was published at SCLeM (Workshop at EMNLP) 2017.

Campus universitaire bât 507
Rue du Belvedère
F - 91405 Orsay cedex
Tél +33 (0) 1 69 15 80 15


Rapport scientifique


Le LIMSI en chiffres

7 équipes de recherche
100 chercheurs et enseignants-chercheurs
40 ingénieurs et techniciens
60 doctorants
70 stagiaires


Paris-Saclay nouvelle fenêtre

Logo DataIA