This half-day workshop, is held in conjunction with the LREC Conference. The theme of the workshop is "what exists, how it's used and what is needed in terms of Evaluation" within Human Language Technology programs, such as the European IST-HLT program, the US Darpa programs or those organized by NIST, their Japanese counterparts, or other national and transnational programs.
| 9:00-- 9:05 | Opening address: Joseph Mariani (LIMSI-CNRS, France) |
| Session: Experiences in HLT Evaluation (Chair: Patrick Paroubek, LIMSI-CNRS, France) | |
| 9:05 -- 9:20 | Fei Xia and Martha Palmer (Dept of Computer and Information Science, University of Pennsylvania, USA) Evaluating the Coverage of LTAGs on Annotated Corpora |
| 9:20 -- 9:35 | Rashmi Prasad and Anoop Sarkar (IRCS, University of Pennsylvania, USA) Comparing Test-suite based evaluation and Corpus-based evaluation of a wide-coverage grammar for English |
| 9:35 -- 9:50 | Béatrice Daille (IRIN, Université de Nantes, France) Evaluating a Multi-Word Term Indexing System: Method, Implementation and Report |
| 9:50 -- 10:05 | Kyo Kageura (NACSIS, Japan) IR/IE/Summarization Evaluation Projects in Japan |
| 10:05 -- 10:20 | Monika Höge (University of Helsinki, Finland) A Framework for the Quantitative and Qualitative Evaluation of Translator's Aids Systems |
| 10:20 -- 10:35 | Rémi Zajac (Computer Research Laboratory, New Mexico State University, USA) Evaluation of the Machine Translation of Financial Documents |
| 10:35 -- 10:50 | Claude de Loupy and Patrice Bellot (LIA, Université d'Avignon, France) Evaluation of Document Retrieval Systems |
| 10:50 -- 11:05 | Ellen M. Voorhees and Dawn M. Tice (National Institute of Standards and Technology, USA) Implementing a Question Answering Evaluation |
| 11:05 -- 11:20 | Niels Ole Bernsen and Laila Dybkjaer (NIS, University of Southern Denmark, Denmark) Is that a Good Spoken Language Dialogue System? |
| 11:20 -- 11:40 | Coffee break |
| Session: Issues and Prospects in HLT Evaluation (Chair: Niels Ole Bernsen, NIS, University of Southern Denmark, Denmark) | |
| 11:40 -- 11:55 | Patrick Paroubek (Spoken Language Processing Group, LIMSI-CNRS, France) Categorical Data-Specification for Control Task Formalization and Validation in Quantitative Black Box Evaluation |
| 11:55 -- 12:10 | Lynette Hirschman (MITRE, USA) Reading Comprehension and Question-Answering New Evaluation Paradigms for Human Language Technology |
| 12:10 -- 12:25 | Gérard Sabah (Langage and Cognition Group, LIMSI-CNRS, France) To Validate or not to Validate? - Some difficulties for a scientific evaluation of natural language processing systems |
| Panel Session: The Future of HLT Evaluation (Chairs: Lynette Hirschman, MITRE, USA & Patrick Paroubek, LIMSI-CNRS, France) | |
| 12:30 -- 13:05 |
To initiate the Panel Session, the participants will present in 5 minutes (max) their views on how to use evaluation in Human Language Technology projects : What will be evaluated? What are the evaluation techniques? Which resources are needed? What is the interest of using evaluation within projects? How technology evaluation relates to usage evalution? How should evaluation be included in present and future HLT programs? Is it possible to share resources, tools or expertise on evaluation across projects? How to conduct international cooperation in this framework? |
| 13:05 -- 13:25 |
Continuation of the Panel Session with participation of the audience. |
| 13:25 -- 13:30 | Closing statement: Joseph Mariani (LIMSI-CNRS, France) |