Paper: Improved Extraction Assessment through Better Language Models

ACL ID N10-1026
Title Improved Extraction Assessment through Better Language Models
Venue Human Language Technologies
Session Main Conference
Year 2010
Authors

A variety of information extraction techniques rely on the fact that instances of the same relation are “distributionally similar,” in that they tend to appear in similar textual con- texts. We demonstrate that extraction accu- racy depends heavily on the accuracy of the language model utilized to estimate distribu- tional similarity. An unsupervised model se- lection technique based on this observation is shown to reduce extraction and type-checking error by 26% over previous results, in experi- ments with Hidden Markov Models. The re- sults suggest that optimizing statistical lan- guage models over unlabeled data is a promis- ing direction for improving weakly supervised and unsupervised information extraction.