Paper: A Comparison Of Manual And Automatic Constructions Of Category Hierarchy For Classifying Large Corpora

ACL ID W04-2409
Title A Comparison Of Manual And Automatic Constructions Of Category Hierarchy For Classifying Large Corpora
Venue International Conference on Computational Natural Language Learning
Session Main Conference
Year 2004
Authors

We address the problem dealing with a large collection of data, and investigate the use of automatically constructing category hierarchy from a given set of categories to improve clas- sification of large corpora. We use two well- known techniques, partitioning clustering, CZ- means and a D0D3D7D7 CUD9D2CRD8CXD3D2 to create category hierarchy. CZ-means is to cluster the given cate- gories in a hierarchy. To select the proper num- ber of CZ, we use a D0D3D7D7 CUD9D2CRD8CXD3D2 which mea- sures the degree of our disappointment in any differences between the true distribution over inputs and the learner’s prediction. Once the optimal number of CZ is selected, for each clus- ter, the procedure is repeated. Our evaluation using the 1996 Reuters corpus which consists of 806,791 documents shows ...