Paper: Char Align: A Program For Aligning Parallel Texts At The Character Level

ACL ID P93-1001
Title Char Align: A Program For Aligning Parallel Texts At The Character Level
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1993
Authors

There have been a number of recent papers on aligning parallel texts at the sentence level, e.g., Brown et al (1991), Gale and Church (to appear), Isabelle (1992), Kay and R/Ssenschein (to appear), Simard et al (1992), Warwick- Armstrong and Russell (1990). On clean inputs, such as the Canadian Hansards, these methods have been very successful (at least 96% correct by sentence). Unfortunately, if the input is noisy (due to OCR and/or unknown markup conventions), then these methods tend to break down because the noise can make it difficult to find paragraph boundaries, let alone sentences. This paper describes a new program, charalign, that aligns texts at the character level rather than at the sentence/paragraph level, based on the cognate approach proposed by Simard et al.