Paper: Error Mining For Wide-Coverage Grammar Engineering

ACL ID P04-1057
Title Error Mining For Wide-Coverage Grammar Engineering
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2004

Parsing systems which rely on hand-coded linguis- tic descriptions can only perform adequately in as far as these descriptions are correct and complete. The paper describes an error mining technique to discover problems in hand-coded linguistic descrip- tions for parsing such as grammars and lexicons. By analysing parse results for very large unannotated corpora, the technique discovers missing, incorrect or incomplete linguistic descriptions. The technique uses the frequency of n-grams of words for arbitrary values of n. It is shown how a new combination of suffix arrays and perfect hash finite automata allows an efficient implementation.