Paper: Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries

ACL ID D12-1075
Title Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2012
Authors

Past work on learning part-of-speech taggers from tag dictionaries and raw data has re- ported good results, but the assumptions made about those dictionaries are often unrealistic: due to historical precedents, they assume ac- cess to information about labels in the raw and test sets. Here, we demonstrate ways to learn hidden Markov model taggers from in- complete tag dictionaries. Taking the MIN- GREEDY algorithm (Ravi et al., 2010) as a starting point, we improve it with several intu- itive heuristics. We also define a simple HMM emission initialization that takes advantage of the tag dictionary and raw data to capture both the openness of a given tag and its estimated prevalence in the raw data. Altogether, our augmentations produce improvements to per- formance over the original MIN-G...