Paper: A Program For Aligning Sentences In Bilingual Corpora

ACL ID P91-1023
Title A Program For Aligning Sentences In Bilingual Corpora
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1991
Authors

Researchers in both machine Iranslation (e.g. , Brown et al. , 1990) and bilingual lexicography (e.g. , Klavans and Tzoukermann, 1990) have recently become interested in studying parallel texts, texts such as the Canadian Hansards (parliamentary proceedings) which are available in multiple languages (French and English). This paper describes a method for aligning sentences in these parallel texts, based on a simple statistical model of character lengths. The method was developed and tested on a small trilingual sample of Swiss economic reports. A much larger sample of 90 million words of Canadian Hansards has been aligned and donated to the ACL/DCI.