Paper: Incorporating GENETAG-style annotation to GENIA corpus

ACL ID W09-1313
Title Incorporating GENETAG-style annotation to GENIA corpus
Venue Workshop on Biomedical Natural Language Processing
Year 2009

RNA are physical entities. While the three physical entity types are disjoint, the abstract concept, gene, is defined from a different perspective and is realized in, not disjoint from, the physical entity types. The latest public version of GENIA corpus (hereafter “old corpus”) contains annotations for geneProtein DNA RNA GGP Old Annotation 21,489 8,653 876 N/A New Annotation 15,452 7,872 863 12,272 Table 1: Statistics on annotation for gene-related entities related entities, but they are classified into only physical entity types: Protein, DNA and RNA. The corpus revisions described in this work are two-fold. First, annotation for the abstract entity, gene, were added (Table 1, GGP). To emphasize the characteristics of the new entity type, which does not distinguish a gene and its pr...