Paper: Cross-Instance Tuning of Unsupervised Document Clustering Algorithms

ACL ID N07-1032
Title Cross-Instance Tuning of Unsupervised Document Clustering Algorithms
Venue Human Language Technologies
Session Main Conference
Year 2007
Authors

In unsupervised learning, where no train- ing takes place, one simply hopes that the unsupervised learner will work well on any unlabeled test collection. How- ever, when the variability in the data is large, such hope may be unrealistic; a tuning of the unsupervised algorithm may then be necessary in order to perform well on new test collections. In this paper, we show how to perform such a tuning in the context of unsupervised document clustering, by (i) introducing a degree of freedom, α, into two leading information- theoretic clustering algorithms, through the use of generalized mutual informa- tion quantities; and (ii) selecting the value of α based on clusterings of similar, but supervised document collections (cross- instance tuning). One option is to perform a tuning that direct...