Paper: Self-Training with Products of Latent Variable Grammars

ACL ID D10-1002
Title Self-Training with Products of Latent Variable Grammars
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2010
Authors

We study self-training with products of latent variable grammars in this paper. We show that increasing the quality of the automatically parsed data used for self-training gives higher accuracy self-trained grammars. Our genera- tive self-trained grammars reach F scores of 91.6 on the WSJ test set and surpass even discriminative reranking systems without self- training. Additionally, we show that multi- ple self-trained grammars can be combined in a product model to achieve even higher ac- curacy. The product model is most effective when the individual underlying grammars are most diverse. Combining multiple grammars that were self-trained on disjoint sets of un- labeled data results in a final test accuracy of 92.5% on the WSJ test set and 89.6% on our Broadcast News test set.