Paper: OntoNotes: Corpus Cleanup of Mistaken Agreement Using Word Sense Disambiguation

ACL ID C08-1133
Title OntoNotes: Corpus Cleanup of Mistaken Agreement Using Word Sense Disambiguation
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2008
Authors

Annotated corpora are only useful if their annotations are consistent. Most large-scale annotation efforts take special measures to reconcile inter-annotator disagreement. To date, however, no-one has investigated how to automatically determine exemplars in which the annotators agree but are wrong. In this paper, we use OntoNotes, a large-scale corpus of semantic annotations, including word senses, predicate-argument structure, ontology linking, and coreference. To de- termine the mistaken agreements in word sense annotation, we employ word sense disambiguation (WSD) to select a set of suspicious candidates for human evaluation. Experiments are conducted from three as- pects (precision, cost-effectiveness ratio, and entropy) to examine the performance of WSD. The experime...