Paper: Minimally-Supervised Extraction of Entities from Text Advertisements

ACL ID N10-1009
Title Minimally-Supervised Extraction of Entities from Text Advertisements
Venue Human Language Technologies
Session Main Conference
Year 2010
Authors

Extraction of entities from ad creatives is an important problem that can benefit many com- putational advertising tasks. Supervised and semi-supervised solutions rely on labeled data which is expensive, time consuming, and dif- ficult to procure for ad creatives. A small set of manually derived constraints on fea- ture expectations over unlabeled data can be used to partially and probabilistically label large amounts of data. Utilizing recent work in constraint-based semi-supervised learning, this paper injects light weight supervision specified as these “constraints” into a semi- Markov conditional random field model of en- tity extraction in ad creatives. Relying solely on the constraints, the model is trained on a set of unlabeled ads using an online learning al- gorithm. We demons...