Paper: Weakly Supervised Named Entity Transliteration And Discovery From Multilingual Comparable Corpora

ACL ID P06-1103
Title Weakly Supervised Named Entity Transliteration And Discovery From Multilingual Comparable Corpora
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2006
Authors

Named Entity recognition (NER) is an important part of many natural language processing tasks. Current approaches of- ten employ machine learning techniques and require supervised data. However, many languages lack such resources. This paper presents an (almost) unsupervised learning algorithm for automatic discov- ery of Named Entities (NEs) in a resource free language, given a bilingual corpora in which it is weakly temporally aligned with a resource rich language. NEs have similar time distributions across such corpora, and often some of the tokens in a multi-word NE are transliterated. We develop an algo- rithm that exploits both observations itera- tively. The algorithm makes use of a new, frequency based, metric for time distribu- tions and a resource free discriminative ap- proach t...