Paper: The Role of Information Extraction in the Design of a Document Triage Application for Biocuration

ACL ID W11-0206
Title The Role of Information Extraction in the Design of a Document Triage Application for Biocuration
Venue Workshop on Biomedical Natural Language Processing
Session
Year 2011
Authors

Traditionally, automated triage of papers is performed using lexical (unigram, bigram, and sometimes trigram) features. This pa- per explores the use of information extrac- tion (IE) techniques to create richer linguistic features than traditional bag-of-words models. Our classifier includes lexico-syntactic pat- terns and more-complex features that repre- sent a pattern coupled with its extracted noun, represented both as a lexical term and as a semantic category. Our experimental results show that the IE-based features can improve performance over unigram and bigram fea- tures alone. We present intrinsic evaluation results of full-text document classification ex- periments to determine automatically whether a paper should be considered of interest to biologists at the Mouse Genome Inform...