ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | P10-2039 |
---|---|
Title | Efficient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-Of-Speech Tagging |
Venue | Annual Meeting of the Association of Computational Linguistics |
Session | Short Paper |
Year | 2010 |
Authors |
The Minimum Description Length (MDL) principle is a method for model selection that trades off between the explanation of the data by the model and the complexity of the model itself. Inspired by the MDL principle, we develop an objective func- tion for generative models that captures the description of the data by the model (log-likelihood) and the description of the model (model size). We also develop a ef- ficient general search algorithm based on the MAP-EM framework to optimize this function. Since recent work has shown that minimizing the model size in a Hidden Markov Model for part-of-speech (POS) tagging leads to higher accuracies, we test our approach by applying it to this prob- lem. The search algorithm involves a sim- ple change to EM and achieves high POS tagging accuracies on...