Paper: Building Comparable Corpora Based on Bilingual LDA Model

ACL ID P13-2050
Title Building Comparable Corpora Based on Bilingual LDA Model
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2013
Authors

Comparable corpora are important basic re- sources in cross-language information pro- cessing. However, the existing methods of building comparable corpora, which use inter- translate words and relative features, cannot evaluate the topical relation between document pairs. This paper adopts the bilingual LDA model to predict the topical structures of the documents and proposes three algorithms of document similarity in different languages. Experiments show that the novel method can obtain similar documents with consistent top- ics own better adaptability and stability per- formance.