Paper: Better Informed Training Of Latent Syntactic Features

ACL ID W06-1638
Title Better Informed Training Of Latent Syntactic Features
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2006

We study unsupervised methods for learn- ing refinements of the nonterminals in a treebank. Following Matsuzaki et al. (2005) and Prescher (2005), we may for example split NP without supervision into NP[0] andNP[1], which behave differently. We first propose to learn a PCFG that adds such features to nonterminals in such a way that they respect patterns of linguis- tic feature passing: each node’s nontermi- nal features are either identical to, or inde- pendent of, those of its parent. This lin- guistic constraint reduces runtime and the number of parameters to be learned. How- ever, it did not yield improvements when training on the Penn Treebank. An orthog- onal strategy was more successful: to im- prove the performance of the EM learner by treebank preprocessing and by anneal- ing met...