Paper: Mandarin Part-of-Speech Tagging and Discriminative Reranking

ACL ID D07-1117
Title Mandarin Part-of-Speech Tagging and Discriminative Reranking
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2007

We present in this paper methods to improve HMM-based part-of-speech (POS) tagging of Mandarin. We model the emission prob- ability of an unknown word using all the characters in the word, and enrich the stan- dard left-to-right trigram estimation of word emission probabilities with a right-to-left prediction of the word by making use of the current and next tags. In addition, we utilize the RankBoost-based reranking algorithm to rerank the N-best outputs of the HMM- based tagger using various n-gram, mor- phological, and dependency features. Two methods are proposed to improve the gen- eralization performance of the reranking al- gorithm. Our reranking model achieves an accuracy of 94.68% using n-gram and mor- phological features on the Penn Chinese Treebank 5.2, and is able to further im...