Paper: Discriminative Improvements to Distributional Sentence Similarity

ACL ID D13-1090
Title Discriminative Improvements to Distributional Sentence Similarity
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2013
Authors

Matrix and tensor factorization have been ap- plied to a number of semantic relatedness tasks, including paraphrase identification. The key idea is that similarity in the latent space implies semantic relatedness. We describe three ways in which labeled data can im- prove the accuracy of these approaches on paraphrase classification. First, we design a new discriminative term-weighting metric called TF-KLD, which outperforms TF-IDF. Next, we show that using the latent repre- sentation from matrix factorization as features in a classification algorithm substantially im- proves accuracy. Finally, we combine latent features with fine-grained n-gram overlap fea- tures, yielding performance that is 3% more accurate than the prior state-of-the-art.