Paper: Compacting the Penn Treebank Grammar

ACL ID C98-1111
Title Compacting the Penn Treebank Grammar
Venue International Conference on Computational Linguistics
Session Main Conference
Year 1998

'I~-eebanks, such as the Penn Treebank (PTB), offer a simple approach to obtaining a broad (:overage grammar: one can simply read the grammar off the parse trees in the treebank. While such a grammar is easy to obtain, a square-root rate of growth of the rule set with corpus size suggests that the derived grammar is far fi'om complete and that much more tree- banked text would be required to obtain a com- plete grammar, if one exists at some limit. However, we offer an alternative explanation in terms of the underspecification of structures within the treebank. This hypothesis is ex- plored by applying an algorithm to compact the derived grammar by eliminating redund- ant rules - rules whose right hand sides can be parsed by other rules. The size of the result- ing compac...