Paper: Robust Measurement and Comparison of Context Similarity for Finding Translation Pairs

ACL ID C10-1003
Title Robust Measurement and Comparison of Context Similarity for Finding Translation Pairs
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2010
Authors

In cross-language information retrieval it is often important to align words that are similar in meaning in two corpora writ- ten in different languages. Previous re- search shows that using context similar- ity to align words is helpful when no dictionary entry is available. We sug- gest a new method which selects a sub- set of words (pivot words) associated with a query and then matches these words across languages. To detect word associa- tions, we demonstrate that a new Bayesian method for estimating Point-wise Mutual Information provides improved accuracy. In the second step, matching is done in a novel way that calculates the chance of an accidental overlap of pivot words us- ing the hypergeometric distribution. We implemented a wide variety of previously suggested methods. Testing i...