Paper: Lexical Chains and Sliding Locality Windows in Content-based Text Similarity Detection

ACL ID I05-2026
Title Lexical Chains and Sliding Locality Windows in Content-based Text Similarity Detection
Venue International Joint Conference on Natural Language Processing
Session poster-demo-tutorial
Year 2005
Authors

We present a system to determine content similarity of documents. Our goal is to identify pairs of book chapters that are translations of the same original chapter. Achieving this goal requires identification of not only the different topics in the documents but also of the particular flow of these topics. Our approach to content similarity evaluation employs n- grams of lexical chains and measures similarity using the cosine of vectors of n-grams of lexical chains, vectors of tf*idf- weighted keywords, and vectors of unweighted lexical chains (unigrams of lexical chains). Our results show that n-grams of unordered lexical chains of length four or more are particularly useful for the recognition of content similarity.