Paper: A Comparison Of Document Sentence And Term Event Spaces

ACL ID P06-1076
Title A Comparison Of Document Sentence And Term Event Spaces
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2006

The trend in information retrieval sys- tems is from document to sub-document retrieval, such as sentences in a summari- zation system and words or phrases in question-answering system. Despite this trend, systems continue to model lan- guage at a document level using the in- verse document frequency (IDF). In this paper, we compare and contrast IDF with inverse sentence frequency (ISF) and in- verse term frequency (ITF). A direct comparison reveals that all language models are highly correlated; however, the average ISF and ITF values are 5.5 and 10.4 higher than IDF. All language models appeared to follow a power law distribution with a slope coefficient of 1.6 for documents and 1.7 for sentences and terms. We conclude with an analysis of IDF stability with respect to random, journal, an...