Paper: Probabilistic CFG With Latent Annotations

ACL ID P05-1010
Title Probabilistic CFG With Latent Annotations
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2005

This paper defines a generative probabilis- tic model of parse trees, which we call PCFG-LA. This model is an extension of PCFG in which non-terminal symbols are augmented with latent variables. Fine- grained CFG rules are automatically in- duced from a parsed corpus by training a PCFG-LA model using an EM-algorithm. Because exact parsing with a PCFG-LA is NP-hard, several approximations are de- scribed and empirically compared. In ex- periments using the Penn WSJ corpus, our automatically trained model gave a per- formance of 86.6% (Fa5, sentences a6 40 words), which is comparable to that of an unlexicalized PCFG parser created using extensive manual feature selection.