Paper: Detecting Text Similarity Over Short Passages: Exploring Linguistic Feature Combinations Via Machine Learning

ACL ID W99-0625
Title Detecting Text Similarity Over Short Passages: Exploring Linguistic Feature Combinations Via Machine Learning
Venue 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora
Session Main Conference
Year 1999
Authors

We present a new composite similarity metric that combines information from multiple lin- guistic indicators to measure semantic distance between pairs of small textual units. Several potential features are investigated and an opti- real combination is selected via machine learn- ing. We discuss a more restrictive definition of similarity than traditional, document-level and information retrieval-oriented, notions of similarity, and motivate it by showing its rel- evance to the multi-document text summariza- tion problem. Results from our system are eval- uated against standard information retrieval techniques, establishing that the new method is more effective in identifying closely related textual units. 1 Research Goals In this paper, we focus on the problem of detect- ing whether two s...