Paper: Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora

ACL ID P08-1049
Title Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2008
Authors

Chinese abbreviations are widely used in modern Chinese texts. Compared with English abbreviations (which are mostly acronyms and truncations), the formation of Chinese abbreviationsis much more complex. Due to the richness of Chinese abbreviations, manyof them may notappearin available par- allel corpora, in which case current machine translation systems simply treat them as un- known words and leave them untranslated. In this paper, we present a novel unsupervised methodthatautomaticallyextractsthe relation between a full-form phrase and its abbrevia- tion from monolingual corpora, and induces translation entries for the abbreviation by us- ing its full-form as a bridge. Our method does notrequireanyadditionalannotateddata other than the data that a regular translation system uses. We in...