Paper: Adding Noun Phrase Structure to the Penn Treebank

ACL ID P07-1031
Title Adding Noun Phrase Structure to the Penn Treebank
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2007

The Penn Treebank does not annotate within base noun phrases (NPs), commit- ting only to at structures that ignore the complexity of English NPs. This means that tools trained on Treebank data cannot learn the correct internal structure of NPs. This paper details the process of adding gold-standard bracketing within each noun phrase in the Penn Treebank. We then examine the consistency and reliabil- ity of our annotations. Finally, we use this resource to determine NP structure using several statistical approaches, thus demonstrating the utility of the corpus. This adds detail to the Penn Treebank that is necessary for many NLP applications.