Paper: Lexicon Effects On Chinese Information Retrieval

ACL ID W97-0316
Title Lexicon Effects On Chinese Information Retrieval
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 1997
  • Kui Lam Kwok (City University of New York-Queens College, Flushing NY)

We investigate the effects of lexicon size and stopwords on Chinese information retrieval using our method of short-word segmentation based on simple language usage rules and statistics. These rules allow us to employ a small lexicon of only 2,175 entries and provide quite admirable retrieval results. It is noticed that accurate segmentation is not essential for good retrieval. Larger lexicons can lead to incremental improvements. The presence of stopwords do not contribute much noise to IR. Their removal risks elimination of crucial words in a query and adversely affect retrieval, especially when the queries are short. Short queries of a few words perform more than 10% worse than paragraph-size queries.