Paper: Using Bilingual Comparable Corpora And Semi-Supervised Clustering For Topic Tracking

ACL ID P06-2030
Title Using Bilingual Comparable Corpora And Semi-Supervised Clustering For Topic Tracking
Venue Annual Meeting of the Association of Computational Linguistics
Session Poster Session
Year 2006
Authors

We address the problem dealing with skewed data, and propose a method for estimating effective training stories for the topic tracking task. For a small number of labelled positive stories, we extract story pairs which consist of positive and its as- sociated stories from bilingual comparable corpora. To overcome the problem of a large number of labelled negative stories, we classify them into some clusters. This is done by using k-means with EM. The results on the TDT corpora show the ef- fectiveness of the method.