Paper: Less Is More: Eliminating Index Terms From Subordinate Clauses

ACL ID P99-1045
Title Less Is More: Eliminating Index Terms From Subordinate Clauses
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1999
Authors

We perform a linguistic analysis of documents during indexing for information retrieval. By eliminating index terms that occur only in subordinate clauses, index size is reduced by approximately 30% without adversely affecting precision or recall. These results hold for two corpora: a sample of the world wide web and an electronic encyclopedia.