Paper: Bitext Correspondences through Rich Mark-up

ACL ID P98-2134
Title Bitext Correspondences through Rich Mark-up
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1998

Rich mark-up can considerably benefit the process of establishing bitext correspondences, that is, the task of providing correct identification and align- ment methods for text segments that are transla- tion equivalences of each other in a parallel corpus. We present a sentence alignment algorithm that, by taking advantage of previously annotated texts, ob- tains accuracy rates close to 100%. The algorithm evaluates the similarity of the linguistic and extra- linguistic mark-up in both sides of a bitext. Given that annotations are neutral with respect to typolog- ical, grammatical and orthographical differences be- tween languages, rich mark-up becomes an optimal foundation to support bitext correspondences. The main originality of this approach is that it makes maximal use of annotations...