Paper: Finding Ideographic Representations Of Japanese Names Written In Latin Script Via Language Identification And Corpus Validation

ACL ID P04-1024
Title Finding Ideographic Representations Of Japanese Names Written In Latin Script Via Language Identification And Corpus Validation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2004
Authors

Multilingual applications frequently involve dealing with proper names, but names are often missing in bilingual lexicons. This problem is exacerbated for applications involving translation between Latin-scripted languages and Asian languages such as Chinese, Japanese and Korean (CJK) where simple string copying is not a solution. We present a novel approach for generating the ideographic representations of a CJK name written in a Latin script. The proposed approach involves first identifying the origin of the name, and then back-transliterating the name to all possible Chinese characters using language-specific mappings. To reduce the massive number of possibilities for computation, we apply a three-tier filtering process by filtering first through a set of attested bigrams, then through ...