Paper: KnowItNow: Fast Scalable Information Extraction From The Web

ACL ID H05-1071
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2005

Numerous NLP applications rely on search-engine queries, both to ex- tract information from and to com- pute statistics over the Web corpus. But search engines often limit the number of available queries. As a result, query-intensive NLP applica- tions such as Information Extraction (IE) distribute their query load over several days, making IE a slow, off- line process. This paper introduces a novel archi- tecture for IE that obviates queries to commercial search engines. The ar- chitecture is embodied in a system called KNOWITNOW that performs high-precision IE in minutes instead of days. We compare KNOWITNOW experimentally with the previously- published KNOWITALL system, and quantify the tradeoff between re- call and speed. KNOWITNOW’s ex- traction rate is two to three orders of magnit...