Paper: Document Representation In Natural Language Text Retrieval

ACL ID H94-1072
Title Document Representation In Natural Language Text Retrieval
Venue Human Language Technologies
Session Main Conference
Year 1994
Authors

In information retrieval, the content of a document may be represented as a collection of terms: words, stems, phrases, or other units derived or inferred from the text of the document. These terms are usually weighted to indicate their importance within the document which can then be viewed as a vector in a N- dimensional space. In this paper we demonstrate that a proper term weighting is at least as important as their selection, and that dif- ferent types of terms (e.g. , words, phrases, names), and terms derived by different means (e.g. , statistical, linguistic) must be treated differently for a maximum benefit in rel~ieval. We report some observations made during and after the second Text REtrieval Conference (TREC-2). 1 1.