Paper: Unsupervised Grammar Induction By Distribution And Attachment

ACL ID W06-2916
Title Unsupervised Grammar Induction By Distribution And Attachment
Venue International Conference on Computational Natural Language Learning
Session Main Conference
Year 2006

Distributional approaches to grammar in- duction are typically inefficient, enumer- ating large numbers of candidate con- stituents. In this paper, we describe a simplified model of distributional analy- sis which uses heuristics to reduce the number of candidate constituents under consideration. We apply this model to a large corpus of over 400000 words of written English, and evaluate the results using EVALB. We show that the perfor- mance of this approach is limited, provid- ing a detailed analysis of learned structure and a comparison with actual constituent- context distributions. This motivates a more structured approach, using a process of attachment to form constituents from their distributional components. Our find- ings suggest that distributional methods do not generalize enough...