Paper: Self-Training PCFG Grammars with Latent Annotations Across Languages

ACL ID D09-1087
Title Self-Training PCFG Grammars with Latent Annotations Across Languages
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2009
Authors

We investigate the effectiveness of self- training PCFG grammars with latent anno- tations (PCFG-LA) for parsing languages with different amounts of labeled training data. Compared to Charniak’s lexicalized parser, the PCFG-LA parser was more ef- fectively adapted to a language for which parsing has been less well developed (i.e., Chinese) and benefited more from self- training. We show for the first time that self-training is able to significantly im- prove the performance of the PCFG-LA parser, a single generative parser, on both small and large amounts of labeled train- ing data. Our approach achieves state- of-the-art parsing accuracies for a single parser on both English (91.5%) and Chi- nese (85.2%).