Paper: Expectations of Word Sense in Parallel Corpora

ACL ID N12-1078
Title Expectations of Word Sense in Parallel Corpora
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2012

Given a parallel corpus, if two distinct words in language A, a1 and a2, are aligned to the same word b1 in language B, then this might signal that b1 is polysemous, or it might sig- nal a1 and a2 are synonyms. Both assump- tions with successful work have been put for- ward in the literature. We investigate these assumptions, along with other questions of word sense, by looking at sampled parallel sentences containing tokens of the same type in English, asking how often they mean the same thing when they are: 1. aligned to the same foreign type; and 2. aligned to different foreign types. Results for French-English and Chinese-English parallel corpora show simi- lar behavior: Synonymy is only very weakly the more prevalent scenario, where both cases regularly occur.