Paper: Aligning More Words With High Precision For Small Bilingual Corpora

ACL ID C96-1037
Title Aligning More Words With High Precision For Small Bilingual Corpora
Venue International Conference on Computational Linguistics
Session Main Conference
Year 1996
Authors

In this paper, we propose an algorithm for aligning words with their translation in a bilingual corpus. Conventional algorithms are based on word-by-word models which require bilingual data with hundreds of thousand sentences for training. By using a word-based approach, less frequent words or words with diverse translations generally do not have statistically significant evidence for confident alignment. Consequently, incomplete or incorrect alignments occur. Our algorithm attempts to handle the problem using class- based rules which are automatic acquired from bilingual materials such as a bilingual corpus or machine readable dictionary. The procedures for acquiring these rules is also described. We found that the algorithm can align over 80% of word pairs while maintaining a comparably ...