Paper: Multiple Similarity Measures And Source-Pair Information In Story Link Detection

ACL ID N04-1040
Title Multiple Similarity Measures And Source-Pair Information In Story Link Detection
Venue Human Language Technologies
Session Main Conference
Year 2004
Authors

State-of-the-art story link detection systems, that is, systems that determine whether two sto- ries are about the same event or linked, are usu- ally based on the cosine-similarity measured between two stories. This paper presents a method for improving the performance of a link detection system by using a variety of simi- larity measures and using source-pair specific statistical information. The utility of a num- ber of different similarity measures, including cosine, Hellinger, Tanimoto, and clarity, both alone and in combination, was investigated. We also compared several machine learning techniques for combining the different types of information. The techniques investigated were SVMs, voting, and decision trees, each of which makes use of similarity and statisti- cal information dif...