Paper: Bilingual Cluster Based Models for Statistical Machine Translation

ACL ID D07-1054
Title Bilingual Cluster Based Models for Statistical Machine Translation
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2007
  • Hirohumi Yamamoto (National Institute of Information and Communications Technology, Kyoto Japan; ATR Spoken Language Communication Research Laboratories, Kyoto Japan)
  • Eiichiro Sumita

We propose a domain specific model for statistical machine translation. It is well- known that domain specific language mod- els perform well in automatic speech recog- nition. We show that domain specific lan- guage and translation models also benefit statistical machine translation. However, there are two problems with using domain specific models. The first is the data sparse- ness problem. We employ an adaptation technique to overcome this problem. The second issue is domain prediction. In or- der to perform adaptation, the domain must be provided, however in many cases, the domain is not known or changes dynami- cally. For these cases, not only the trans- lation target sentence but also the domain must be predicted. This paper focuses on the domain prediction problem for statisti- cal...