Paper: Classifying Gene Sentences in Biomedical Literature by Combining High-Precision Gene Identifiers

ACL ID W12-2423
Title Classifying Gene Sentences in Biomedical Literature by Combining High-Precision Gene Identifiers
Venue Workshop on Biomedical Natural Language Processing
Session
Year 2012
Authors

Gene name identification is a fundamental step to solve more complicated text mining problems such as gene normalization and pro- tein-protein interactions. However, state-of- the-art name identification methods are not yet sufficient for use in a fully automated sys- tem. In this regard, a relaxed task, gene/protein sentence identification, may serve more effectively for manually searching and browsing biomedical literature. In this pa- per, we set up a new task, gene/protein sen- tence classification and propose an ensemble approach for addressing this problem. Well- known named entity tools use similar gold- standard sets for training and testing, which results in relatively poor performance for un- known sets. We here explore how to combine diverse high-precision gene identifi...