Paper: Automatic Cluster Stopping With Criterion Functions And The Gap Statistic

ACL ID N06-4007
Title Automatic Cluster Stopping With Criterion Functions And The Gap Statistic
Venue Human Language Technologies
Session System Demonstration
Year 2006
Authors

SenseClusters is a freely available sys- tem that clusters similar contexts. It can be applied to a wide range of problems, although here we focus on word sense and name discrimination. It supports several different measures for automati- cally determining the number of clusters in which a collection of contexts should be grouped. These can be used to discover the number of senses in which a word is used in a large corpus of text, or the num- ber of entities that share the same name. There are three measures based on clus- tering criterion functions, and another on the Gap Statistic.