Paper: Automatic Acquisition Of Named Entity Tagged Corpus From World Wide Web

ACL ID P03-2031
Title Automatic Acquisition Of Named Entity Tagged Corpus From World Wide Web
Venue Annual Meeting of the Association of Computational Linguistics
Session System Demonstration
Year 2003
Authors

In this paper, we present a method that automatically constructs a Named En- tity (NE) tagged corpus from the web to be used for learning of Named En- tity Recognition systems. We use an NE list and an web search engine to col- lect web documents which contain the NE instances. The documents are refined through sentence separation and text re- finement procedures and NE instances are finally tagged with the appropriate NE cat- egories. Our experiments demonstrates that the suggested method can acquire enough NE tagged corpus equally useful to the manually tagged one without any human intervention.