Paper: Semi-Supervised Lexicon Mining from Parenthetical Expressions in Monolingual Web Pages

ACL ID N09-1048
Title Semi-Supervised Lexicon Mining from Parenthetical Expressions in Monolingual Web Pages
Venue Human Language Technologies
Session Main Conference
Year 2009
Authors

This paper presents a semi-supervised learn- ing framework for mining Chinese-English lexicons from large amount of Chinese Web pages. The issue is motivated by the ob- servation that many Chinese neologisms are accompanied by their English translations in the form of parenthesis. We classify par- enthetical translations into bilingual abbrevi- ations, transliterations, and translations. A frequency-based term recognition approach is applied for extracting bilingual abbreviations. A self-training algorithm is proposed for min- ing transliteration and translation lexicons. In which, we employ available lexicons in terms of morpheme levels, i.e., phoneme correspon- dences in transliteration and grapheme (e.g., suffix, stem, and prefix) correspondences in translation. The experimental results...