Paper: Exploiting Sophisticated Representations For Document Retrieval

ACL ID A94-1011
Title Exploiting Sophisticated Representations For Document Retrieval
Venue Applied Natural Language Processing Conference
Session Main Conference
Year 1994

The use of NLP techniques for docu- ment classification has not produced signif- icant improvements in performance within the standard term weighting statistical as- signment paradigm (Fagan 1987; Lewis, 1992bc; Buckley, 1993). This perplexing fact needs both an explanation and a so- lution if the power of recently developed NLP techniques are to be successfully ap- plied in IR. A novel method for adding lin- guistic annotation to corpora is presented which involves using a statistical POS tag- ger in conjunction with unsupervised struc- ture finding methods to derive notions of "noun group", "verb group", and so on which is inherently extensible to more so- phisticated annotation, and does not re- quire a pre-tagged corpus to fit. One of the distinguishing features of a more linguisti- ca...