Paper: Text Alignment In The Real World: Improving Alignments Of Noisy Translations Using Common Lexical Features String Matching Strategies And N-Gram Comparisons

ACL ID E95-1010
Title Text Alignment In The Real World: Improving Alignments Of Noisy Translations Using Common Lexical Features String Matching Strategies And N-Gram Comparisons
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 1995
Authors

Alignment methods based on byte-length comparisons of alignment blocks have been remarkably successful for aligning good translations from legislative transcriptions. For noisy translations in which the parallel text of a document has significant structural differences, byte-alignment methods often do not perform well. The Pan American Health Organization (PAHO) corpus is a series of articles that were first translated by machine methods and then improved by pro- fessional translators. Many of the Spanish PAHO texts do not share formatting conven- tions with the corresponding English docu- ments, refer to tables in stylistically different ways and contain extraneous information. A method based on a dynamic programming framework, but using a decision criterion derived from a combination of ...