Paper: Combining String and Context Similarity for Bilingual Term Alignment from Comparable Corpora

ACL ID D14-1177
Title Combining String and Context Similarity for Bilingual Term Alignment from Comparable Corpora
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2014
Authors

Automatically compiling bilingual dictio- naries of technical terms from comparable corpora is a challenging problem, yet with many potential applications. In this paper, we exploit two independent observations about term translations: (a) terms are of- ten formed by corresponding sub-lexical units across languages and (b) a term and its translation tend to appear in similar lex- ical context. Based on the first observa- tion, we develop a new character n-gram compositional method, a logistic regres- sion classifier, for learning a string similar- ity measure of term translations. Accord- ing to the second observation, we use an existing context-based approach. For eval- uation, we investigate the performance of compositional and context-based methods on: (a) similar and unrelated language...