Paper: Automatically Inducing a Part-of-Speech Tagger by Projecting from Multiple Source Languages Across Aligned Corpora

ACL ID I05-1075
Title Automatically Inducing a Part-of-Speech Tagger by Projecting from Multiple Source Languages Across Aligned Corpora
Venue International Joint Conference on Natural Language Processing
Session Main Conference
Year 2005
Authors

We implement a variant of the algorithm described by Yarowsky and Ngai in [21] to induce an HMM POS tagger for an ar- bitrary target language using only an existing POS tagger for a source language and an unannotated parallel corpus between the source and tar- get languages. We extend this work by projecting from multiple source languages onto a single target language. We hypothesize that systematic transfer errors from differing source languages will cancel out, improving the quality of bootstrapped resources in the target language. Our exper- iments confirm the hypothesis. Each experiment compares three cases: (a) source data comes from a single language A, (b) source data comes from a single language B, and (c) source data comes from both A and B, but half as much from each. Apart fro...