Paper: Low-Cost High-Performance Translation Retrieval: Dumber Is Better

ACL ID P01-1004
Title Low-Cost High-Performance Translation Retrieval: Dumber Is Better
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2001
Authors

In this paper, we compare the rela- tive effects of segment order, segmen- tation and segment contiguity on the retrieval performance of a translation memory system. We take a selec- tion of both bag-of-words and segment order-sensitive string comparison meth- ods, and run each over both character- and word-segmented data, in combina- tion with a rangeof local segment con- tiguitymodels(intheformofN-grams). Overtwodistinctdatasets,wefindthat indexing according to simple character bigrams produces a retrieval accuracy superior to any of the tested word N- grammodels. Further,intheiroptimum configuration,bag-of-wordsmethodsare showntobeequivalenttosegmentorder- sensitive methods in terms of retrieval accuracy,butmuchfaster. Wealsopro- videevidencethatourfindingsarescal- able.