Paper: Multilingual And Cross-Lingual News Topic Tracking

ACL ID C04-1138
Title Multilingual And Cross-Lingual News Topic Tracking
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2004
Authors

We are presenting a working system for automated news analysis that ingests an average total of 7600 news articles per day in five languages. For each language, the system detects the major news stories of the day using a group-average unsupervised ag- glomerative clustering process. It also tracks, for each cluster, related groups of articles published over the previous seven days, using a cosine of weighted terms. The system furthermore tracks re- lated news across languages, in all language pairs involved. The cross-lingual news cluster similarity is based on a linear combination of three types of input: (a) cognates, (b) automatically detected ref- erences to geographical place names and (c) the re- sults of a mapping process onto a multilingual clas- sification system. A manual evalua...