Paper: Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia

ACL ID E14-3012
Title Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

In this paper we propose a new methodology to ex- ploit Wikipedia features and structure to automati- cally develop an Arabic NE annotated corpus. Each Wikipedia link is transformed into an NE type of the target article in order to produce the NE an- notation. Other Wikipedia features - namely redi- rects, anchor texts, and inter-language links - are used to tag additional NEs, which appear without links in Wikipedia texts. Furthermore, we have de- veloped a filtering algorithm to eliminate ambiguity when tagging candidate NEs. Herein we also in- troduce a mechanism based on the high coverage of Wikipedia in order to address two challenges partic- ular to tagging NEs in Arabic text: rich morphology and the absence of capitalisation. The corpus cre- ated with our new method (WDC) has been u...