Paper: Improved Pattern Learning for Bootstrapped Entity Extraction

ACL ID W14-1611
Title Improved Pattern Learning for Bootstrapped Entity Extraction
Venue International Conference on Computational Natural Language Learning
Year 2014

Bootstrapped pattern learning for entity extraction usually starts with seed entities and iteratively learns patterns and entities from unlabeled text. Patterns are scored by their ability to extract more positive en- tities and less negative entities. A prob- lem is that due to the lack of labeled data, unlabeled entities are either assumed to be negative or are ignored by the existing pat- tern scoring measures. In this paper, we improve pattern scoring by predicting the labels of unlabeled entities. We use var- ious unsupervised features based on con- trasting domain-specific and general text, and exploiting distributional similarity and edit distances to learned entities. Our system outperforms existing pattern scor- ing algorithms for extracting drug-and- treatment entities from four ...