Paper: Statistical models for text normalization and machine translation

ACL ID W14-4605
Title Statistical models for text normalization and machine translation
Venue Celtic Language Technology Workshop
Session
Year 2014
Authors

Irish and Scottish Gaelic are closely-related languages that together with Manx Gaelic make up the Goidelic branch of the Celtic family. We present a statistical model for translation from Scottish Gaelic to Irish that we hope will facilitate communication between the two language communities, especially in social media. An important aspect of this work is to overcome the orthographical differences between the languages, many of which were introduced in a major spelling reform of Irish in the 1940?s and 1950?s. Prior to that date, the orthographies of the two languages were quite similar, thanks in part to a shared literary tradition. As a consequence of this, machine translation from Scottish Gaelic to Irish has a great deal in common with the problem of normalizing pre-standard Irish tex...