Paper: An Alignment Method For Noisy Parallel Corpora Based On Image Processing Techniques

ACL ID P97-1038
Title An Alignment Method For Noisy Parallel Corpora Based On Image Processing Techniques
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1997
Authors

This paper presents a new approach to bitext correspondence problem (BCP) of noisy bilingual corpora based on image processing (IP) techniques. By using one of several ways of estimating the lexical translation probability (LTP) between pairs of source and target words, we can turn a bitext into a discrete gray-level image. We contend that the BCP, when seen in this light, bears a striking resemblance to the line detection problem in IP. Therefore, BCPs, including sentence and word alignment, can benefit from a wealth of effective, well established IP techniques, including convolution-based filters, texture analysis and Hough transform. This paper describes a new program, PlotAlign that produces a word-level bitext map for noisy or non-literal bitext, based on these techniques. Keywords: a...