Paper: Employing Topic Models for Pattern-based Semantic Class Discovery

ACL ID P09-1052
Title Employing Topic Models for Pattern-based Semantic Class Discovery
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2009
Authors

A semantic class is a collection of items (words or phrases) which have semantically peer or sibling relationship. This paper studies the employment of topic models to automati- cally construct semantic classes, taking as the source data a collection of raw semantic classes (RASCs), which were extracted by ap- plying predefined patterns to web pages. The primary requirement (and challenge) here is dealing with multi-membership: An item may belong to multiple semantic classes; and we need to discover as many as possible the dif- ferent semantic classes the item belongs to. To adopt topic models, we treat RASCs as “doc- uments”, items as “words”, and the final se- mantic classes as “topics”. Appropriate preprocessing and postprocessing are per- formed to improve ...