A Questionnaire about EVALUATION in Language Engineering

by the ELSE project LE4-8340

Following the LREC conference in Granada and the "Towards a European Open Evaluation Infrastructure in Speech and NL" satellite workshop, organized by Elsnet and ELSE, the ELSE consortium would like to offer you the possibility to influence the shape and composition of evaluation in Natural Language Processing (comprehensive of both text and speech processing aspects) in the context of the fifth framework program (FP5) of the European Commission.

To the end, we would like to have your opinion on the following few points, which we have grouped into 4 sections of unequal size: (I)General, (II)Technology developers, (III)Technology users, (IV)Infrastructure.

Please note that it should not take you more than ten minutes to reply, and if it happens that you do not have time enough to provide an answer to all the questions, or that you fill like skipping some questions, please send us anyway a partially filled form (to send the form, please hit the send button located at the end of this form); it will already help us a lot, particularly if we receive it before the date of SEPTEMBER 30 th 1998.

Whenever appropriate, questions can be considered as multiple-choice ones.

Of course, your reply will be kept anonymous.

In case you are unfamiliar with some of the terms used below, a short glossary is provided at the end of the questionnaire.

If you have any question, please, don't hesitate to contact us.

Let's start!



I ) GENERAL


Q0 ) What is your background?

Your institution is a:
Large Corporate Industry
Small or Medium sized Enterprises (SME)
Private Research Institution
University
Public Research Institution
Funding Agency
Other (please describe):

It is spread over:
town(s) countrie(s) continent(s)

Its head-office is located in: (country).

Name (OPTIONAL & CONFIDENTIAL):
Institution (OPTIONAL & CONFIDENTIAL):
Position (OPTIONAL & CONFIDENTIAL):


Q1 ) Do you think evaluation is important for Language Engineering (LE) R&D?

Yes
No

If your answer is No then you are done with our questionnaire!


Q2 ) Are you interested in LE evaluation as:

a technology developer
a technology user
both
other(please describe):
if you have checked the item marked "technology developer" or "both" please proceed, otherwise if you have checked "technology user", please go to question Q11), otherwise please go the question Q19.



II ) TECHNOLOGY DEVELOPER

Q3 ) Would you be willing to participate to a comparative evaluation exercise at the European level?

Yes
No


Q4 ) Would you or your institution agree to contribute some person-power for participating to such exercises?

To manage your participation to such event:

None
No more than (s) per year

To customize, port, adapt or format your system or data to other languages, applications domains or target customers:

None
No more than (s) per year


Q5) Which control task(s) would you whish the evaluation to use?

As an indication we give you here is a list of possible candidate control tasks which is not exhaustive (please select as many as you whish).
For each task, the aspect concerned is mentioned: G=Generic, S=Speech and T=Text.
The [] enclose examples of previous evaluation campaigns which used (or are using) the control task mentioned.
G Language Models [ARC B1/Aupelf/FR/95]
G Translation Memories (sub-sentence level matching and partial clause analysis)
S Machine Translation [DARPA/USA/92/93/94]
T " "
S Multilingual data alignment [ARC A2/Aupelf/FR/95]
T " " "
T Terminology Extraction [ARC A3 /Aupelf/FR/95]
S Document Extraction [TREC/DARPA/USA/92-98]
T " "
S Text Understanding (information template filling) [MUC/DARPA/USA/87-97]
T " "
T Text Generation (from information templates)
T Summary Generation [SUMMAC/DARPA/USA/98]
T Text Segmenting
S Speech Segmenting
S Continuous Speech Recognition [DARPA/USA/84-98 and ARC B1/Aupelf/FR/95]
S Speech Synthesis [ARC B3Aupelf/FR/95]
S Topic Detection and Tracking [TDT/DARPA/USA/98]
T " " " "
T POS tagging [GRACE-CNRS/FR/94-98] 3
T Parsing [SPARKLE/EU/96]
T Lemmatizers [Morpholympics/Germany/94]
T Word Sense Disambiguation [SENSEVAL/98]
T Predicate Argument Structure
S Coreference Identification [DARPA/USA/95]
T " "
S Named Entities Extraction [DARPA/USA/95-98]
T " " "
S Database querying for tourist/travel information [EuroSpeech97/Elsnet and ARC B2/Aupelf/FR/95]
T Hand Written Recognition [NIST/USA/92]
S Language Identification
T " "
S Speaker Verification

other (please describe):


Q6 ) For which application domain(s)?

e.g. Telecom, Transport, Banking, etc.


Q7 ) Do you have a system that you would like to have evaluated or did you develop an approach that you would like to test (possibly using a borrowed system)?

Yes
No

If your answer is No please go to question Q9.


Q8 ) What kind of system would you like to present to an evaluation campaign?

a whole system,
only a module of a larger system.

This system is a (please select several item if they all apply to your case):

result of in-house development
result of non-proprietary module integration
system from another company/institution
public domain system


Q9 ) What are the benefits you expect to gain from participating (please select several)?

money
advertisement, visibility, image
new contacts
information on the competition
a better understanding of the technology
other (please describe):


Q10 ) Any other comment?



III ) TECHNOLOGY USER

If you are a technology user (you answered so to question Q2), please express your opinion on the following points, otherwise skip to the section INFRASTRUCTURE before question Q19.


Q11 ) Would you be willing to get access to the result of comparative technology evaluation?

Yes
No


Q12 ) Would you commission the evaluation of a technology you need for a specific application?

Yes
No


Q13 ) Would you commission the evaluation of a technology to get general information on the current state of the art?

Yes
No


Q14 ) Would you be ready to finance such technology evaluation?

Yes, up to ECUs per technology
No


Q15 ) Would you agree to share the financial effort with other entities, in order to share the information on the technology evaluation?

Yes, any entity
Yes, but not with my competitors
No, only for my own sake


Q16 ) Would you be interested in siting in an advisory board of such an evaluation infrastructure?

Yes
No

If you answered No please skip the next question, i.e. go to question Q18.


Q17 ) Would you be ready to financially support the infrastructure, in order to sit in that advisory board?

Yes, by contributing up to ECUs per year.
No


Q18 ) Any other comment?

IV ) INFRASTRUCTURE

If you whish to go back to question:
Q2, "Are you interested in LE evaluation as a technology developer,...?"
Q11"TECHNOLOGY USER" section


Q19 ) Do you think core technology evaluation is:

More important than specific field evaluation
As important as specific field evaluation
Less important than specific field evaluation


Q20 ) Do you think that setting an evaluation infrastructure is part of the role of the European Commission, or rather that it should be purely independent?

Part of EC role.
No connection.


Q21 ) What do you think is in your opinion the best way of implementing evaluation?

through specific time limited projects, each for a different evaluation
through a time limited project, global for all sorts of evaluations
by creating an evaluation agency
through contracts with private companies, each for a different evaluation
through a contract with one private company for all sorts of evaluations
through a professional association


Q22 ) Any other comment?

the questionnaire
the questionnaire

Go back to the top of the questionnaire



Thanks you very much for contributing to ELSE efforts toward a better knowledge of Language Engineering.



V ) ANNEX: A short evaluation glossary


COMPARATIVE EVALUATION EXERCISE:
The concept on comparing various systems/approaches on a common task using common data and common metrics according to a common calendar and a common protocol.
CONTROL TASK:
The information transformation/production process that all the systems participating to a comparative evaluation exercise perform.
CORE TECHNOLOGY EVALUATION:
The paradigm of evaluation applied to a technology essential to a domain (e.g. Speech recognition is a core technology for Spoken Language Dialog Systems).
EVALUATION:
The assessment of the various pro and cons of a system/approach made in a reproducible, systematic, scientific and transparent way.
EVALUATION CAMPAIGN:
An instance of a comparative evaluation exercise, with all its components (data, metrics, control task, protocol, participants, etc.) identified.
EVALUATION INFRASTRUCTURE:
Everything that is essential for supporting the implementation of evaluation campaigns (evaluators, language resource providers, data, organizers, metrics, evaluation protocols, etc.)
FIELD EVALUATION:
Domain specific application evaluation with end-user involvement.
POS TAGGING:
A control task where the information production process consists in assigning Part-Of-Speech tags to the words of an arbitrary piece of text. (e.g. John loves Marie. ---> John/NOUN-PROPER loves/VERB Marie/NOUN-PROPER)
SPEECH TRANSCRIPTION:
Automatic transcoding of an acoustic signal into electronic text.
SPELL CHECKING:
(Semi) Automatic spelling errors correction.
TECHNOLOGY EVALUATION:
It addresses the problem of testing a given technology, trying to assess its range of performance and appropriateness for solving a particular problem (e.g. DARPA/NIST evaluation exercises proved that Hidden Markov Models are the right technology to use for Speech Recognition).
VIDEO INDEXING:
Automatic indexing of video (image & sound) data for future retrieval (generally done by image analysis or sound track transcription).