Paper: PCFG Induction for Unsupervised Parsing and Language Modelling

ACL ID D14-1141
Title PCFG Induction for Unsupervised Parsing and Language Modelling
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2014
Authors

The task of unsupervised induction of probabilistic context-free grammars (PCFGs) has attracted a lot of attention in the field of computational linguistics. Although it is a difficult task, work in this area is still very much in demand since it can contribute to the advancement of language parsing and modelling. In this work, we describe a new algorithm for PCFG induction based on a principled approach and capable of inducing accurate yet compact artificial natural language grammars and typical context-free gram- mars. Moreover, this algorithm can work on large grammars and datasets and infers correctly even from small samples. Our analysis shows that the type of grammars induced by our algorithm are, in theory, capable of modelling natural language. One of our experiments shows that our...