Paper: Identifying Word Translations from Comparable Corpora Using Latent Topic Models

ACL ID P11-2084
Title Identifying Word Translations from Comparable Corpora Using Latent Topic Models
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

A topic model outputs a set of multinomial distributions over words for each topic. In this paper, we investigate the value of bilin- gual topic models, i.e., a bilingual Latent Dirichlet Allocation model for finding trans- lations of terms in comparable corpora with- out using any linguistic resources. Experi- ments on a document-aligned English-Italian Wikipedia corpus confirm that the developed methods which only use knowledge from word-topic distributions outperform methods based on similarity measures in the original word-document space. The best results, ob- tained by combining knowledge from word- topic distributions with similarity measures in the original space, are also reported.