Paper: A Robust Cross-Style Bilingual Sentences Alignment Model

ACL ID C02-1009
Title A Robust Cross-Style Bilingual Sentences Alignment Model
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2002
Authors

Most current sentence alignment approaches adopt sentence length and cognate as the alignment features; and they are mostly trained and tested in the docu- ments with the same style. Since the length distribu- tion, alignment-type distribution (used by length-based approaches) and cognate frequency vary significantly across texts with different styles, the length-based ap- proaches fail to achieve similar performance when tested incorpora ofdifferent styles. The experiments show that the performance in F-measure could drop from 98.2% to 85.6% when a length-based approach is trained by a technical manual and then tested on a general magazine. Sincealargepercentageofcontentwordsinthesource text would be translated into the corresponding trans- lation duals to preserve the meaning in the targ...