Paper: Urdu and Hindi: Translation and sharing of linguistic resources

ACL ID C10-2147
Title Urdu and Hindi: Translation and sharing of linguistic resources
Venue International Conference on Computational Linguistics
Session Poster Session
Year 2010
Authors

Hindi and Urdu share a common phonol- ogy, morphology and grammar but are written in different scripts. In addition, thevocabularieshavealsodivergedsignif- icantly especially in the written form. In this paper we show that we can get rea- sonable quality translations (we estimated theTranslationErrorrateat18%)between the two languages even in absence of a parallel corpus. Linguistic resources such as treebanks, part of speech tagged data and parallel corpora with English are lim- ited for both these languages. We use the translation system to share linguistic re- sources between the two languages. We demonstrate improvements on three tasks and show: statistical machine translation from Urdu to English is improved (0.8 in BLEU score) by using a Hindi-English parallel corpus, Hindi part of s...