Paper: Learning Term-weighting Functions for Similarity Measures

ACL ID D09-1083
Title Learning Term-weighting Functions for Similarity Measures
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2009

Measuring the similarity between two texts is a fundamental problem in many NLP and IR applications. Among the ex- isting approaches, the cosine measure of the term vectors representing the origi- nal texts has been widely used, where the score of each term is often determined by a TFIDF formula. Despite its sim- plicity, the quality of such cosine similar- ity measure is usually domain dependent and decided by the choice of the term- weighting function. In this paper, we pro- pose a novel framework that learns the term-weighting function. Given the la- beled pairs of texts as training data, the learning procedure tunes the model pa- rameters by minimizing the specified loss function of the similarity score. Com- pared to traditional TFIDF term-weighting schemes, our approach shows a signi...