Paper: Machine Translation without Words through Substring Alignment

ACL ID P12-1018
Title Machine Translation without Words through Substring Alignment
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2012

In this paper, we demonstrate that accu- rate machine translation is possible without the concept of ?words,? treating MT as a problem of transformation between character strings. We achieve this result by applying phrasal inversion transduction grammar align- ment techniques to character strings to train a character-based translation model, and us- ing this in the phrase-based MT framework. We also propose a look-ahead parsing algo- rithm and substring-informed prior probabil- ities to achieve more effective and efficient alignment. In an evaluation, we demonstrate that character-based translation can achieve results that compare to word-based systems while effectively translating unknown and un- common words over several language pairs.