Paper: Using paraphrases for improving first story detection in news and Twitter

ACL ID N12-1034
Title Using paraphrases for improving first story detection in news and Twitter
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2012
Authors

First story detection (FSD) involves identify- ing first stories about events from a continuous stream of documents. A major problem in this task is the high degree of lexical variation in documents which makes it very difficult to de- tect stories that talk about the same event but expressed using different words. We suggest using paraphrases to alleviate this problem, making this the first work to use paraphrases for FSD. We show a novel way of integrat- ing paraphrases with locality sensitive hashing (LSH) in order to obtain an efficient FSD sys- tem that can scale to very large datasets. Our system achieves state-of-the-art results on the first story detection task, beating both the best supervised and unsupervised systems. To test our approach on large data, we construct a cor- pus of...