Paper: Biomedical Named Entity Recognition Using Conditional Random Fields And Rich Feature Sets

ACL ID W04-1221
Title Biomedical Named Entity Recognition Using Conditional Random Fields And Rich Feature Sets
Venue International Joint Workshop On Natural Language Processing In Biomedicine And Its Applications NLPBA BioNLP
Session
Year 2004
Authors
  • Burr Settles (University of Wisconsin-Madison, Madison WI)

This paper presents a framework for simultaneously recognizing occurrences of PROTEIN, DNA, RNA, CELL-LINE, and CELL-TYPE entity classes using Conditional Random Fields with a variety of traditional and novel features. I show that this approach can achieve an overall F1 measure around 70, which seems to be the current state of the art. The system described here was developed as part of the BioNLP/NLPBA 2004 shared task. Experiments were conducted on a training and evaluation set provided by the task organizers. 2 Conditional Random Fields Biomedical named entity recognition can be thought of as a sequence segmentation problem: each word is a token in a sequence to be assigned a label (e.g. PROTEIN, DNA, RNA, CELL-LINE, CELL-TYPE, or OTHER1). Conditional Random Fields (CRFs) are undirected ...