Paper: Machine Translation without Words through Substring Alignment

ACL ID P12-1018
Title Machine Translation without Words through Substring Alignment
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2012
Authors

In this paper, we demonstrate that accu- rate machine translation is possible without the concept of ?words,? treating MT as a problem of transformation between character strings. We achieve this result by applying phrasal inversion transduction grammar align- ment techniques to character strings to train a character-based translation model, and us- ing this in the phrase-based MT framework. We also propose a look-ahead parsing algo- rithm and substring-informed prior probabil- ities to achieve more effective and efficient alignment. In an evaluation, we demonstrate that character-based translation can achieve results that compare to word-based systems while effectively translating unknown and un- common words over several language pairs.