Paper: A Geometric Approach To Mapping Bitext Correspondence

Year 1996

The first step in most corpus-based mul- tilingual NLP work is to construct a de- tailed map of the correspondence between a text and its translation. Several auto- matic methods for this task have been pro- posed in recent years. "Yet even the best of these methods can err by several typeset pages. The Smooth Injective Map Recog- nizer (SIMR) is a new bitext mapping al- gorithm. SIMR's errors are smaller than those of the previous front-runner by more than a factor of 4. Its robustness has en- abled new commercial-quality applications. The greedy nature of the algorithm makes it independent of memory resources. Unlike other bitext mapping algorithms, SIMR al- lows crossing correspondences to account for word order differences. Its output can be converted quickly and easily into a sen- ten...