Paper: Compiling a Massive Multilingual Dictionary via Probabilistic Inference

ACL ID P09-1030
Title Compiling a Massive Multilingual Dictionary via Probabilistic Inference
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2009
Authors

Can we automatically compose a large set of Wiktionaries and translation dictionar- ies to yield a massive, multilingual dic- tionary whose coverage is substantially greater than that of any of its constituent dictionaries? The composition of multiple translation dictionaries leads to a transitive inference problem: if word A translates to word B which in turn translates to word C, what is the probability that C is a trans- lation of A? The paper introduces a novel algorithm that solves this problem for 10,000,000 words in more than 1,000 languages. The algorithm yields PANDIC- TIONARY, a novel multilingual dictionary. PANDICTIONARY containsmorethanfour times as many translations than in the largest Wiktionary at precision 0.90 and over 200,000,000 pairwise translations in over 200,000 lan...