Paper: Benefits of the `Massively Parallel Rosetta Stone': Cross-Language Information Retrieval with over 30 Languages

ACL ID P07-1110
Title Benefits of the `Massively Parallel Rosetta Stone': Cross-Language Information Retrieval with over 30 Languages
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2007
Authors

In this paper, we describe our experiences in extending a standard cross-language in- formation retrieval (CLIR) approach which uses parallel aligned corpora and Latent Semantic Indexing. Most, if not all, previous work which follows this ap- proach has focused on bilingual retrieval; two examples involve the use of French- English or English-Greek parallel cor- pora. Our extension to the approach is ‘massively parallel’ in two senses, one linguistic and the other computational. First, we make use of a parallel aligned corpus consisting of almost 50 parallel translations in over 30 distinct languages, each in over 30,000 documents. Given the size of this dataset, a ‘massively parallel’ approach was also necessitated in the more usual computational sense. Our re- sults indicate that...