Paper: Cascading Use Of Soft And Hard Matching Pattern Rules For Weakly Supervised Information Extraction

ACL ID C04-1078
Title Cascading Use Of Soft And Hard Matching Pattern Rules For Weakly Supervised Information Extraction
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2004
Authors

Current rule induction techniques based on hard matching (i.e. , strict slot-by-slot matching) tend to fare poorly in extracting information from natural language texts, which often exhibit great variations. The reason is that hard matching techniques result in relatively high precision but low recall. To tackle this problem, we take advantage of the newly proposed soft pattern rules which offer high recall through the use of probabilistic matching. We propose a bootstrapping framework in which soft and hard matching pattern rules are combined in a cascading manner to realize a weakly supervised rule induction scheme. The system starts with a small set of hand-tagged instances. At each iteration, we first generate soft pattern rules and utilize them to tag new training instances automatica...