Paper: A Comparative Study of Word Co-occurrence for Term Clustering in Language Model-based Sentence Retrieval

ACL ID N10-1046
Title A Comparative Study of Word Co-occurrence for Term Clustering in Language Model-based Sentence Retrieval
Venue Human Language Technologies
Session Main Conference
Year 2010
Authors

Sentence retrieval is a very important part of question answering systems. Term clustering, in turn, is an effective approach for improving sentence retrieval performance: the more simi- lar the terms in each cluster, the better the per- formance of the retrieval system. A key step in obtaining appropriate word clusters is accurate estimation of pairwise word similarities, based on their tendency to co-occur in similar con- texts. In this paper, we compare four differ- ent methods for estimating word co-occurrence frequenciesfrom two different corpora. The re- sults show that different, commonly-used con- texts for defining word co-occurrence differ significantlyin retrieval performance. Using an appropriate co-occurrence criterion and corpus is shown to improve the mean average preci- sio...