Paper: Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment

ACL ID D09-1014
Title Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2009
Authors

Traditionally, machine learning ap- proaches for information extraction require human annotated data that can be costly and time-consuming to produce. However, in many cases, there already exists a database (DB) with schema related to the desired output, and records related to the expected input text. We present a conditional random field (CRF) that aligns tokens of a given DB record and its realization in text. The CRF model is trained using only the available DB and unlabeled text with generalized expecta- tion criteria. An annotation of the text induced from inferred alignments is used to train an information extractor. We eval- uate our method on a citation extraction task in which alignments between DBLP database records and citation texts are used to train an extractor. Experimental ...