Paper: POS Tagging of English-Hindi Code-Mixed Social Media Content

ACL ID D14-1105
Title POS Tagging of English-Hindi Code-Mixed Social Media Content
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2014
Authors

Code-mixing is frequently observed in user generated content on social media, especially from multilingual users. The linguistic complexity of such content is compounded by presence of spelling vari- ations, transliteration and non-adherance to formal grammar. We describe our initial efforts to create a multi-level an- notated corpus of Hindi-English code- mixed text collated from Facebook fo- rums, and explore language identifica- tion, back-transliteration, normalization and POS tagging of this data. Our re- sults show that language identification and transliteration for Hindi are two major challenges that impact POS tagging accu- racy.