Paper: Automatic Alignment In Parallel Corpora

ACL ID P94-1051
Title Automatic Alignment In Parallel Corpora
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1994

This paper addresses the alignment issue in the framework of exploitation of large bi- multilingual corpora for translation purposes. A generic alignment scheme is proposed that can meet varying requirements of different applications. Depending on the level at which alignment is sought, appropriate surface linguistic information is invoked coupled with information about possible unit delimiters. Each text unit (sentence, clause or phrase) is represented by the sum of its content tags. The results are then fed into a dynamic programming framework that computes the optimum alignment of units. The proposed scheme has been tested at sentence level on parallel corpora of the CELEX database. The success rate exceeded 99%. The next steps of the work concern the testing of the scheme's efficiency ...