Paper: A Method Of Measuring Term Representativeness - Baseline Method Using Co-Occurrence Distribution

ACL ID C00-1047
Title A Method Of Measuring Term Representativeness - Baseline Method Using Co-Occurrence Distribution
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2000
Authors

This paper introduces a scheme, which we call the baseline method, to define a measure of term representativeness and measures defined by using the scheme. The representativeness of a term is measured by a normalized characteristic value defined for a set of all documents that contain the term. Normalization is done by comparing the original characteristic value with the characteristic value defined for a randomly chosen document set of the same size. The latter value is estimated by a baseline function obtained by random sampling and logarithmic linear approximation. We found that the distance between the word distribution in a document set and the word distribution in a whole corpus is an effective characteristic value to use for the baseline method. Measures defined by the baseline meth...