Paper: The infinite HMM for unsupervised PoS tagging

ACL ID D09-1071
Title The infinite HMM for unsupervised PoS tagging
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2009

We extend previous work on fully unsu- pervised part-of-speech tagging. Using a non-parametric version of the HMM, called the infinite HMM (iHMM), we ad- dress the problem of choosing the number of hidden states in unsupervised Markov models for PoS tagging. We experi- ment with two non-parametric priors, the Dirichlet and Pitman-Yor processes, on the Wall Street Journal dataset using a paral- lelized implementation of an iHMM in- ference algorithm. We evaluate the re- sults with a variety of clustering evalua- tion metrics and achieve equivalent or bet- ter performances than previously reported. Building on this promising result we eval- uate the output of the unsupervised PoS tagger as a direct replacement for the out- put of a fully supervised PoS tagger for the task of shallow parsing ...