Paper: Human Judgement as a Parameter in Evaluation Campaigns

ACL ID W08-1204
Title Human Judgement as a Parameter in Evaluation Campaigns
Venue Coling 2008: Proceedings of the 2nd workshop on Information Retrieval for Question Answering
Year 2008

The relevance of human judgment in an evaluation campaign is illustrated here through the DEFT text mining campaigns. In a first step, testing a topic for a cam- paign among a limited number of human evaluators informs us about the feasibility of a task. This information comes from the results obtained by the judges, as well as from their personal impressions after pass- ing the test. In a second step, results from individual judges, as well as their pairwise matching, are used in order to adjust the task (choice of a marking scale for DEFT’07 and selec- tion of topical categories for DEFT’08). Finally, the mutual comparison of com- petitors’ results, at the end of the evalu- ation campaign, confirms the choices we made at its starting point, and provides means to redefine the task w...