Paper: Unsupervised Constraint Driven Learning For Transliteration Discovery

ACL ID N09-1034
Title Unsupervised Constraint Driven Learning For Transliteration Discovery
Venue Human Language Technologies
Session Main Conference
Year 2009
Authors

This paper introduces a novel unsupervised constraint-driven learning algorithm for iden- tifying named-entity (NE) transliterations in bilingual corpora. The proposed method does not require any annotated data or aligned cor- pora. Instead, it is bootstrapped using a simple resource – a romanization table. We show that this resource, when used in conjunction with constraints, can efficiently identify translitera- tion pairs. We evaluate the proposed method on transliterating English NEs to three differ- ent languages - Chinese, Russian and Hebrew. Our experiments show that constraint driven learning can significantly outperform existing unsupervised models and achieve competitive results to existing supervised models.