Paper: Crouching Dirichlet Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

ACL ID D10-1020
Title Crouching Dirichlet Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2010
Authors

We define the crouching Dirichlet, hidden Markov model (CDHMM), an HMM for part- of-speech tagging which draws state prior dis- tributions for each local document context. This simple modification of the HMM takes advantage of the dichotomy in natural lan- guage between content and function words. In contrast, a standard HMM draws all prior dis- tributions once over all states and it is known to perform poorly in unsupervised and semi- supervised POS tagging. This modification significantly improves unsupervised POS tag- ging performance across several measures on five data sets for four languages. We also show that simply using different hyperparameter values for content and function word states in a standard HMM (which we call HMM+) is surprisingly effective.