Paper: Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification

ACL ID P98-1034
Title Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1998
Authors

Finding simple, non-recursive, base noun phrases is an important subtask for many natural language processing applications. While previous empirical methods for base NP identification have been rather complex, this paper instead proposes a very simple algorithm that is tailored to the relative simplicity of the task. In particular, we present a corpus-based approach for finding base NPs by matching part-of- speech tag sequences. The training phase of the al- gorithm is based on two successful techniques: first the base NP grammar is read from a "treebank" cor- pus; then the grammar is improved by selecting rules with high "benefit" scores. Using this simple algo- rithm with a naive heuristic for matching rules, we achieve surprising accuracy in an evaluation on the Penn Treebank Wall Stree...