Paper: A Portable Algorithm For Mapping Bitext Correspondence

ACL ID P97-1039
Title A Portable Algorithm For Mapping Bitext Correspondence
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1997

The first step in most empirical work in multilingual NLP is to construct maps of the correspondence between texts and their translations (bitext maps). The Smooth Injective Map Recognizer (SIMR) algo- rithm presented here is a generic pattern recognition algorithm that is particularly well-suited to mapping bitext correspon- dence. SIMR is faster and significantly more accurate than other algorithms in the literature. The algorithm is robust enough to use on noisy texts, such as those result- ing from OCR input, and on translations that are not very literal. SIMR encap- sulates its language-specific heuristics, so that it can be ported to any language pair with a minimal effort.