Paper: Multi-View Co-Training of Transliteration Model

ACL ID I08-1049
Title Multi-View Co-Training of Transliteration Model
Venue International Joint Conference on Natural Language Processing
Session Main Conference
Year 2008

This paper discusses a new approach to training of transliteration model from unlabeled data for transliteration extraction. We start with an inquiry into the formulation of transliteration model by considering different transliteration strategies as a multi-view problem, where each view exploits a natural division of transliteration features, such as phoneme- based, grapheme-based or hybrid features. Then we introduce a multi-view Co- training algorithm, which leverages compatible and partially uncorrelated information across different views to effectively boost the model from unlabeled data. Applying this algorithm to transliteration extraction, the results show that it not only circumvents the need of data labeling, but also achieves performance close to that of superv...