Paper: Bootstrapping A Multilingual Part-Of-Speech Tagger In One Person-Day

ACL ID W02-2006
Title Bootstrapping A Multilingual Part-Of-Speech Tagger In One Person-Day
Venue International Conference on Computational Natural Language Learning
Session Main Conference
Year 2002
Authors

This paper presents a method for bootstrapping a fine-grained, broad-coverage part-of-speech (POS) tagger in a new language using only one person- day of data acquisition effort. It requires only three resources, which are currently readily available in 60-100 world languages: (1) an online or hard-copy pocket-sized bilingual dictionary, (2) a basic library reference grammar, and (3) access to an existing monolingual text corpus in the language. The al- gorithm begins by inducing initial lexical POS dis- tributions from English translations in a bilingual dictionary without POS tags. It handles irregular, regular and semi-regular morphology through a ro- bust generative model using weighted Levenshtein alignments. Unsupervised induction of grammatical gender is performed via global modelin...