ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | W11-0206 |
---|---|
Title | The Role of Information Extraction in the Design of a Document Triage Application for Biocuration |
Venue | Workshop on Biomedical Natural Language Processing |
Session | |
Year | 2011 |
Authors |
Traditionally, automated triage of papers is performed using lexical (unigram, bigram, and sometimes trigram) features. This pa- per explores the use of information extrac- tion (IE) techniques to create richer linguistic features than traditional bag-of-words models. Our classifier includes lexico-syntactic pat- terns and more-complex features that repre- sent a pattern coupled with its extracted noun, represented both as a lexical term and as a semantic category. Our experimental results show that the IE-based features can improve performance over unigram and bigram fea- tures alone. We present intrinsic evaluation results of full-text document classification ex- periments to determine automatically whether a paper should be considered of interest to biologists at the Mouse Genome Inform...