Paper: Mining Wiki Resources for Multilingual Named Entity Recognition

ACL ID P08-1001
Title Mining Wiki Resources for Multilingual Named Entity Recognition
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2008
Authors

In this paper, we describe a system by which the multilingual characteristics of Wikipedia can be utilized to annotate a large corpus of text with Named Entity Recognition (NER) tags requiring minimal human intervention and no linguistic expertise. This process, though of value in languages for which resources exist, is particularly useful for less commonly taught languages. We show how the Wikipedia format can be used to identify possible named entities and discuss in detail the process by which we use the Category structure inherent to Wikipedia to determine the named entity type of a proposed entity. We further describe the methods by which English language data can be used to bootstrap the NER process in other languages. We demonstrate the system by using the gen...