Paper: Empirical Term Weighting And Expansion Frequency

ACL ID W00-1315
Title Empirical Term Weighting And Expansion Frequency
Venue 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora
Session Main Conference
Year 2000

We propose an empirical method for estimating term weights directly from relevance judgements, avoiding various standard but potentially trouble- some assumptions. It is common to assume, for ex- ample, that weights vary with term frequency (t f) and inverse document frequency (idf) in a particu- lar way, e.g., tf. idf, but the fact that there are so many variants of this formula in the literature sug- gests that there remains considerable uncertainty about these assumptions. Our method is similar to the Berkeley regression method where labeled rel- evance judgements are fit as a linear combination of (transforms of) t f, idf, etc. Training meth- ods not only improve performance, but also ex- tend naturally to include additional factors such as burstiness and query expansion. The proposed ...