Toward Automatic Fact-Checking of Statistic Claims

Thèse de Tien-Duc CAO, sous la direction de Ioana MANULESCU et Xavier TANNIER. Soutenance le 26 septembre 2019 à 14:30 à INRIA Palaiseau.


Nathalie Aussenac-Gilles, Director of research, IRIT -- Rapporteur

Paolo Papotti, Maître de Conférences, HDR, EURECOM -- Rapporteur

Philippe Pucheral, Professeur, UVSQ and Inria -- Examinateur

Julien Leblay, Chercheur, AIST Japon -- Examinateur

Philippe Lamarre, Professeur, ENSA Lyon -- Examinateur

Ioana Manolescu, Directrice de recherche, Inria et Ecole Polytechnique -- Directrice de thèse

Xavier Tannier, Professeur, Sorbonne Université -- Co-directeur de thèse

Data journalism and journalistic fact-checking are areas of growing interest within the journalism community and also in the audience at large, given the recent interest in misinformation, manipulation through the media, and journalistic efforts to prevent and debunk such attempts. This thesis has been developed within a collaboration between several research laboratories and Les Décodeurs, the fact-checking team of the Le Monde newspaper. The thesis proposed an end-to-end approach toward the automated fact-checking of statistic claims on a topic covered by a reference (trusted) database. Specifically, we have first devised an approach for extracting Linked Open Data from the Web publications of INSEE, the leading french statistic institute. Second, we developed an original search algorithm which, given a set of keywords such as "unemployment rate France 2018", is capable of returning the datasets (and, if possible, the exact values within the datasets) deemed most relevant to the user keywords. Third, we have developed an approach for automatically identifying, in a text written in French, mentions of statistic entities, together with the values associated by the text to these entities, and other context terms (e.g., time or place) attached to the statistic claim. Together, these enable a semi-automated statistic claim verification pipeline, whereas claims are extracted automatically from text and a query is sent to our data retrival algorithm, which returns the reference information closest to the given query. A human user, e.g., a journalist, can then compare the data to the claimed value in order to interpret it in a fact-checking work. This thesis has been carried on within the ANR ContentCheck project focused on models, algorithms and tools for data journalism and journalistic fact-checking (


Campus universitaire bât 507
Rue du Belvedère
F - 91405 Orsay cedex
Tél +33 (0) 1 69 15 80 15


Rapport scientifique


Le LIMSI en chiffres

7 équipes de recherche
100 chercheurs et enseignants-chercheurs
40 ingénieurs et techniciens
60 doctorants
70 stagiaires


Paris-Saclay nouvelle fenêtre

Logo DataIA