Paper: Human Judgement as a Parameter in Evaluation Campaigns

Title Human Judgement as a Parameter in Evaluation Campaigns
The relevance of human judgment in an evaluation campaign is illustrated here through the DEFT text mining campaigns. In a first step, testing a topic for a cam- paign among a limited number of human evaluators informs us about the feasibility of a task. This information comes from the results obtained by the judges, as well as from their personal impressions after pass- ing the test. In a second step, results from individual judges, as well as their pairwise matching, are used in order to adjust the task (choice of a marking scale for DEFT’07 and selec- tion of topical categories for DEFT’08). Finally, the mutual comparison of com- petitors’ results, at the end of the evalu- ation campaign, confirms the choices we made at its starting point, and provides means to redefine the task w...