Paper: Translation Corpus Source and Size in Bilingual Retrieval

ACL ID N09-2007
Title Translation Corpus Source and Size in Bilingual Retrieval
Venue Human Language Technologies
Session Short Paper
Year 2009
Authors

This paper explores corpus-based bilingual re- trieval where the translation corpora used vary by source and size. We find that the quality of translation alignments and the domain of the bitext are important. In some settings these factors are more critical than corpus size. We also show that judicious choice of tokeniza- tion can reduce the amount of bitext required to obtain good bilingual retrieval performance.