Paper: Using Question Series To Evaluate Question Answering System Effectiveness

ACL ID H05-1038
Title Using Question Series To Evaluate Question Answering System Effectiveness
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2005
Authors
  • Ellen M. Voorhees (National Institute of Standards and Technology, Gaithersburg MD)

The original motivation for using ques- tion series in the TREC 2004 question an- swering track was the desire to model as- pects of dialogue processing in an evalu- ation task that included different question types. The structure introduced by the se- ries also proved to have an important ad- ditional benefit: the series is at an appro- priate level of granularity for aggregating scores for an effective evaluation. The series is small enough to be meaningful at the task level since it represents a sin- gle user interaction, yet it is large enough to avoid the highly skewed score distribu- tions exhibited by single questions. An analysis of the reliability of the per-series evaluation shows the evaluation is stable for differences in scores seen in the track. The development of question an...