L I M S I - C N R S

LIMSI Spoken Language Processing Group (TLP)

Laboratory
CNRS home page
LIMSI home page
LIMSI TLP Group



NIS
DFKI
ITC-IRTS
LIMSI


Paris time: 03h52



CLASS LREC 2000 Satellite Workshop on Evaluation


This half-day workshop, was held in conjunction with the LREC Conference in Athens on May 30, 2000. The theme of the workshop was: "what exists, how it's used and what is needed in terms of Evaluation" within Human Language Technology programs, such as the European IST-HLT program, the US HLT or TIDES programs or those organized by NIST, their Japanese counterparts, or other national programs.

Both the evaluation of technologies (systems and components) and the evaluation of applications (user-oriented/usage evaluation) were targeted across various areas of HLT processing encompassing, but not restricted to, written and spoken language processing: Textual and multimedia document retrieval, Cross-lingual information retrieval, Message and document understanding, Topic detection and tracking, Summarization, Machine translation, Speech recognition and dictation, Speech synthesis, Speech coding, Oral and multimodal dialog. Also experience and methodologies from related domains, such as OCR, Face recognition, Image segmentation and understanding, were considered.

This workshop concerned programs, both national and transnational, as well as projects where evaluation is used to compare technologies and approaches, or to measure progress. Participants and organizers of past or present evaluation programs across the world were encouraged to attend, as well as institutions or companies using or including evaluation in their R&D activities. Especially the former EC-supported projects were invited to present their experience in that field, and the new EC IST-HLT selected projects were expected to present their approach or their needs for objective and subjective evaluation.

We wished to get from the workshop presentations a better knowledge of how the various uses and needs of evaluation in the specific context of projects or programs relate to what is available in terms of tools, resources, practices and methodologies for evaluation.

We expected to gather a large selection of actors interested or involved in evaluation for Human Language Processing (spoken and written), or related domains, in the hope that it could trigger in the long term multilingual and crosslingual international cooperation activities. In additions, we had the hope that this workshop could be a kind of trade-fair for Human Language Technologies and Applications Evaluation.

The day of the workshop, there were 28 registered participants and around 60 people present in the room, among which one could notice the inventor of Tree Adjoining Grammars, Prof. Aravind Joshi (IRCS - University Pennsylvania). In his introductory speech, Joseph Mariani (Limsi-CNRS) recalled what was at stake for the development of evaluation in Natural Language Processing (identifying new research direction, technological and scientific progress, better visibility for the domain). He also commented on the large diversity of types (experiment reports, evaluation campaign reports, theoretical and prospective studies) and topics offered by the workshop presentations. The theme of the first session, addressed by 9 presentations, was experience reports. During this session, the discussions on parsing evaluation were of a rather technical nature, while Text Retrieval and its related issues got the lion's share with lost of comments from the audience on methodological or technical aspects as well as infrastructural ones. From the debates, it seems that automatic translation evaluation makes its come-back, particularly in the context of Information Retrieval; and that evaluation of Spoken Language Dialog Systems is still facing the same challenges despites ambitious programs like Communicator in the United-States or Smartkom (Verbmobil follow-up) in Germany.

The next session was more theory-oriented with 3 presentations. A more philosophical orientation given to the last presentation by Gerard Sabbah (Limsi-CNRS) was well received by the audience, which also showed its interest for the pragmatism and prospects offered by the idea presented by, Lynette Hirschman (MITRE), of reusing reading comprehension tests for evaluation purposes.

The last session took the form of a panel session with: Donna Harman (NIST (USA)), Stéphane Chaudiron (MR (France)), Adam Kilgariff (ITRI (Grande-Bretagne)), Édouard Geoffrois (DGA (France)), Khalid Choukri (ELRA), Gerhart Budin (U. Vienna (Autriche), SALT project), Rémi Zajac (New Mexico State University (USA), Transaccount project), Lazaros Polymnenakos (IBM, (Grèce), Catch-2004 project). Each panelist presented briefly his views on evaluation before the general discussion with the public took place. The issues raised during the debates were: the opposition between Technology evaluation and Usage Evaluation (both kinds appear to be complementary), the well-foundedness of a European infrastructure for evaluation (in particular in the context of ongoing cooperation with the United-States), copyrights and access to resources, portability (across languages), evaluation models for terminology, applications and packages for evaluation.

In his concluding speech, Joseph Mariani said that he had seen the workshop as renewed proof of the fully scientific nature of evaluation in Language Engineering, which requires solving both technological and theoretical issues; he also commented on the participation of north-American researchers, pioneers of the domain, Donna Harman and Dave Pallett from NIST, saying that thanks to their efforts, Language Technology had evolved from the Middle-Ages to the Renaissance because they had brought the means to objectively measure advances and progress in the field.


[CLASS main site home page]
[CLASS Evaluation subsite home page]
[LIMSI home page]
[Last updated: Mon Jan 29th 2001]