ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | I05-1075 |
---|---|
Title | Automatically Inducing a Part-of-Speech Tagger by Projecting from Multiple Source Languages Across Aligned Corpora |
Venue | International Joint Conference on Natural Language Processing |
Session | Main Conference |
Year | 2005 |
Authors |
|
We implement a variant of the algorithm described by Yarowsky and Ngai in [21] to induce an HMM POS tagger for an ar- bitrary target language using only an existing POS tagger for a source language and an unannotated parallel corpus between the source and tar- get languages. We extend this work by projecting from multiple source languages onto a single target language. We hypothesize that systematic transfer errors from differing source languages will cancel out, improving the quality of bootstrapped resources in the target language. Our exper- iments confirm the hypothesis. Each experiment compares three cases: (a) source data comes from a single language A, (b) source data comes from a single language B, and (c) source data comes from both A and B, but half as much from each. Apart fro...