Paper: Active Sample Selection for Named Entity Transliteration

ACL ID P08-2014
Title Active Sample Selection for Named Entity Transliteration
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2008

This paper introduces a new method for identifying named-entity (NE) transliterations within bilingual corpora. Current state-of-the- art approaches usually require annotated data and relevant linguistic knowledge which may not be available for all languages. We show how to effectively train an accurate transliter- ation classifier using very little data, obtained automatically. To perform this task, we intro- duce a new active sampling paradigm for guid- ing and adapting the sample selection process. We also investigate how to improve the clas- sifier by identifying repeated patterns in the training data. We evaluated our approach us- ing English, Russian and Hebrew corpora.