Paper: Topic Models + Word Alignment = A Flexible Framework for Extracting Bilingual Dictionary from Comparable Corpus

ACL ID W13-3523
Title Topic Models + Word Alignment = A Flexible Framework for Extracting Bilingual Dictionary from Comparable Corpus
Venue International Conference on Computational Natural Language Learning
Session Main Conference
Year 2013
Authors

We propose a flexible and effective frame- work for extracting a bilingual dictionary from comparable corpora. Our approach is based on a novel combination of topic modeling and word alignment techniques. Intuitively, our approach works by con- verting a comparable document-aligned corpus into a parallel topic-aligned cor- pus, then learning word alignments us- ing co-occurrence statistics. This topic- aligned corpus is similar in structure to the sentence-aligned corpus frequently used in statistical machine translation, enabling us to exploit advances in word alignment re- search. Unlike many previous work, our framework does not require any language- specific knowledge for initialization. Fur- thermore, our framework attempts to han- dle polysemy by allowing multiple trans- lation proba...