Paper: Re-estimation of Lexical Parameters for Treebank PCFGs

ACL ID C08-1025
Title Re-estimation of Lexical Parameters for Treebank PCFGs
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2008
Authors

We present procedures which pool lexical information estimated from unlabeled data via the Inside-Outside algorithm, with lex- ical information from a treebank PCFG. The procedures produce substantial im- provements (up to 31.6% error reduction) on the task of determining subcategoriza- tion frames of novel verbs, relative to a smoothed Penn Treebank-trained PCFG. Even with relatively small quantities of unlabeled training data, the re-estimated models show promising improvements in labeled bracketing f-scores on Wall Street Journal parsing, and substantial benefit in acquiring the subcategorization prefer- ences of low-frequency verbs.