Paper: A Robust Cross-Style Bilingual Sentences Alignment Model

ACL ID C02-1009
Title A Robust Cross-Style Bilingual Sentences Alignment Model
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2002

Most current sentence alignment approaches adopt sentence length and cognate as the alignment features; and they are mostly trained and tested in the docu- ments with the same style. Since the length distribu- tion, alignment-type distribution (used by length-based approaches) and cognate frequency vary significantly across texts with different styles, the length-based ap- proaches fail to achieve similar performance when tested incorpora ofdifferent styles. The experiments show that the performance in F-measure could drop from 98.2% to 85.6% when a length-based approach is trained by a technical manual and then tested on a general magazine. Sincealargepercentageofcontentwordsinthesource text would be translated into the corresponding trans- lation duals to preserve the meaning in the targ...