Paper: Labeling the Languages of Words in Mixed-Language Documents using Weakly Supervised Methods

ACL ID N13-1131
Title Labeling the Languages of Words in Mixed-Language Documents using Weakly Supervised Methods
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2013
Authors

In this paper we consider the problem of label- ing the languages of words in mixed-language documents. This problem is approached in a weakly supervised fashion, as a sequence la- beling problem with monolingual text sam- ples for training data. Among the approaches evaluated, a conditional random field model trained with generalized expectation criteria was the most accurate and performed consis- tently as the amount of training data was var- ied.