Paper: Inducing Multilingual Text Analysis Tools Via Robust Projection Across Aligned Corpora

ACL ID H01-1035
Title Inducing Multilingual Text Analysis Tools Via Robust Projection Across Aligned Corpora
Venue Human Language Technologies
Session Main Conference
Year 2001
Authors

This paper describes a system and set of algorithms for automati- cally inducing stand-alone monolingual part-of-speech taggers, base noun-phrase bracketers, named-entity taggers and morphological analyzers for an arbitrary foreign language. Case studies include French, Chinese, Czech and Spanish. Existing text analysis tools for English are applied to bilingual text corpora and their output projected onto the second language via statistically derived word alignments. Simple direct annotation projection is quite noisy, however, even with optimal alignments. Thus this paper presents noise-robust tagger, bracketer and lemma- tizer training procedures capable of accurate system bootstrapping from noisy and incomplete initial projections. Performance of the induced stand-alone part-of-speech t...