Paper: Text Mining Techniques for Leveraging Positively Labeled Data

ACL ID W11-0220
Title Text Mining Techniques for Leveraging Positively Labeled Data
Venue Workshop on Biomedical Natural Language Processing
Session
Year 2011
Authors

Suppose we have a large collection of documents most of which are unlabeled. Suppose further that we have a small subset of these documents which represent a particular class of documents we are interested in, i.e. these are labeled as positive examples. We may have reason to believe that there are more of these positive class documents in our large unlabeled collection. What data mining techniques could help us find these unlabeled positive examples? Here we examine machine learning strategies designed to solve this problem. We find that a proper choice of machine learning method as well as training strategies can give substantial improvement in retrieving, from the large collection, data enriched with positive examples. We illustrate the principles with a real example co...