Paper: (Semi-)Automatic Detection Of Errors In PoS-Tagged Corpora

ACL ID C02-1021
Title (Semi-)Automatic Detection Of Errors In PoS-Tagged Corpora
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2002

This paper presents a simple yet in practice very efficient technique serving for auto- matic detection of those positions in a part- of-speech tagged corpus where an error is to be suspected. The approach is based on the idea of learning and later application of "ne- gative bigrams", i.e. on the search for pairs of adjacent tags which constitute an incor- rect configuration in a text of a particular language (in English, e.g., the bigram ARTICLE - FINITE VERB). Further, the paper describes the generalization of the "negative bigrams" into "negative n-grams", for any natural n, which indeed provides a powerful tool for error detection in a corpus. The im- plementation is also discussed, as well as evaluation of results of the approach when used for error detection in the NEGRA® cor- pus o...