Paper: Difficult Cases: From Data to Learning, and Back

ACL ID P14-2064
Title Difficult Cases: From Data to Learning, and Back
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

This article contributes to the ongoing dis- cussion in the computational linguistics community regarding instances that are difficult to annotate reliably. Is it worth- while to identify those? What informa- tion can be inferred from them regarding the nature of the task? What should be done with them when building supervised machine learning systems? We address these questions in the context of a sub- jective semantic task. In this setting, we show that the presence of such instances in training data misleads a machine learner into misclassifying clear-cut cases. We also show that considering machine lear- ning outcomes with and without the diffi- cult cases, it is possible to identify specific weaknesses of the problem representation.