Paper: Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching

ACL ID D13-1179
Title Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2013
Authors

Cross-lingual topic modelling has applications in machine translation, word sense disam- biguation and terminology alignment. Multi- lingual extensions of approaches based on la- tent (LSI), generative (LDA, PLSI) as well as explicit (ESA) topic modelling can induce an interlingual topic space allowing documents in different languages to be mapped into the same space and thus to be compared across languages. In this paper, we present a novel approach that combines latent and explicit topic modelling approaches in the sense that it builds on a set of explicitly defined top- ics, but then computes latent relations between these. Thus, the method combines the ben- efits of both explicit and latent topic mod- elling approaches. We show that on a cross- lingual mate retrieval task, our model si...