Paper: N Semantic Classes Are Harder Than Two

ACL ID P06-2007
Title N Semantic Classes Are Harder Than Two
Venue Annual Meeting of the Association of Computational Linguistics
Session Poster Session
Year 2006

We show that we can automatically clas- sify semantically related phrases into 10 classes. Classification robustness is im- proved by training with multiple sources of evidence, including within-document cooccurrence, HTML markup, syntactic relationships in sentences, substitutability in query logs, and string similarity. Our work provides a benchmark for automatic n-way classification into WordNet’s se- mantic classes, both on a TREC news cor- pus and on a corpus of substitutable search query phrases.