Paper: Learning Accurate Compact And Interpretable Tree Annotation

ACL ID P06-1055
Title Learning Accurate Compact And Interpretable Tree Annotation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2006
Authors

We present an automatic approach to tree annota- tion in which basic nonterminal symbols are alter- nately split and merged to maximize the likelihood of a training treebank. Starting with a simple X- bar grammar, we learn a new grammar whose non- terminals are subsymbols of the original nontermi- nals. In contrast with previous work, we are able to split various terminals to different degrees, as ap- propriate to the actual complexity in the data. Our grammars automatically learn the kinds of linguistic distinctions exhibited in previous work on manual tree annotation. On the other hand, our grammars are much more compact and substantially more ac- curate than previous work on automatic annotation. Despite its simplicity, our best grammar achieves an F1 of 90.2% on the Penn Treebank, high...