Paper: An improved corpus of disease mentions in PubMed citations

ACL ID W12-2411
Title An improved corpus of disease mentions in PubMed citations
Venue Workshop on Biomedical Natural Language Processing
Session
Year 2012
Authors

The latest discoveries on diseases and their di- agnosis/treatment are mostly disseminated in the form of scientific publications. However, with the rapid growth of the biomedical litera- ture and a high level of variation and ambigui- ty in disease names, the task of retrieving disease-related articles becomes increasingly challenging using the traditional keyword- based approach. An important first step for any disease-related information extraction task in the biomedical literature is the disease mention recognition task. However, despite the strong interest, there has not been enough work done on disease name identification, perhaps because of the difficulty in obtaining adequate corpora. Towards this aim, we creat- ed a large-scale disease corpus consisting of 6900 disease...