Paper: Automatic Detection and Correction of Errors in Dependency Treebanks

ACL ID P11-2060
Title Automatic Detection and Correction of Errors in Dependency Treebanks
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

Annotated corpora are essential for almost all NLP applications. Whereas they are expected to be of a very high quality because of their importance for the followup developments, they still contain a considerable number of er- rors. With this work we want to draw attention to this fact. Additionally, we try to estimate the amount of errors and propose a method for their automatic correction. Whereas our ap- proach is able to find only a portion of the er- rors that we suppose are contained in almost any annotated corpus due to the nature of the process of its creation, it has a very high pre- cision, and thus is in any case beneficial for the quality of the corpus it is applied to. At last, we compare it to a different method for error detection in treebanks and find out that ...