Paper: A Categorial Variation Database For English

ACL ID N03-1013
Title A Categorial Variation Database For English
Venue Human Language Technologies
Session Main Conference
Year 2003

We describe our approach to the construction and evaluation of a large-scale database called “CatVar” which contains categorial variations of English lexemes. Due to the prevalence of cross-language categorial variation in multilin- gual applications, our categorial-variation re- source may serve as an integral part of a di- verse range of natural language applications. Thus, the research reported herein overlaps heavily with that of the machine-translation, lexicon-construction, and information-retrieval communities. We apply the information-retrieval metrics of precision and recall to evaluate the accuracy and coverage of our database with respect to a human-produced gold standard. This evaluation reveals that the categorial database achieves a high degree of precision and recall. Ad...