Paper: Similarity-Based Estimation Of Word Cooccurrence Probabilities

ACL ID P94-1038
Title Similarity-Based Estimation Of Word Cooccurrence Probabilities
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1994
Authors

In many applications of natural language processing it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations "eat a peach" and "eat a beach" is more likely. Statis- tical NLP methods determine the likelihood of a word combination according to its frequency in a training cor- pus. However, the nature of language is such that many word combinations are infrequent and do not occur in a given corpus. In this work we propose a method for es- timating the probability of such previously unseen word combinations using available information on "most sim- ilar" words. We describe a probabilistic word association model based on distributional word similarity, and apply it to improving probability...