Paper: Reliable Measures For Aligning Japanese-English News Articles And Sentences

ACL ID P03-1010
Title Reliable Measures For Aligning Japanese-English News Articles And Sentences
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2003
Authors

We have aligned Japanese and English news articles and sentences to make a large parallel corpus. We first used a method based on cross-language informa- tion retrieval (CLIR) to align the Japanese and English articles and then used a method based on dynamic programming (DP) matching to align the Japanese and English sentences in these articles. How- ever, the results included many incorrect alignments. To remove these, we pro- pose two measures (scores) that evaluate the validity of alignments. The measure for article alignment uses similarities in sentences aligned by DP matching and that for sentence alignment uses similar- ities in articles aligned by CLIR. They enhance each other to improve the accu- racy of alignment. Using these measures, we have successfully constructed a large- sc...