Paper: Identifying Chemical Names In Biomedical Text: An Investigation Of Substring Co-Occurrence Based Approaches

ACL ID N04-2002
Title Identifying Chemical Names In Biomedical Text: An Investigation Of Substring Co-Occurrence Based Approaches
Venue Human Language Technologies
Session Student Session
Year 2004
Authors

We investigate various strategies for finding chemicals in biomedical text using substring co-occurrence information. The goal is to build a system from readily available data with minimal human involvement. Our models are trained from a dictionary of chemical names and general biomedical text. We investigated several strategies including Naïve Bayes classifiers and several types of N-gram models. We introduced a new way of interpolating N-grams that does not require tuning any parameters. We also found the task to be similar to Language Identification.