ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | P14-2093 |
---|---|
Title | Effective Selection of Translation Model Training Data |
Venue | Annual Meeting of the Association of Computational Linguistics |
Session | Main Conference |
Year | 2014 |
Authors |
Data selection has been demonstrated to be an effective approach to addressing the lack of high-quality bitext for statisti- cal machine translation in the domain of interest. Most current data selection methods solely use language models trained on a small scale in-domain data to select domain-relevant sentence pairs from general-domain parallel corpus. By contrast, we argue that the relevance be- tween a sentence pair and target domain can be better evaluated by the combina- tion of language model and translation model. In this paper, we study and exper- iment with novel methods that apply translation models into domain-relevant data selection. The results show that our methods outperform previous methods. When the selected sentence pairs are evaluated on an end-to-end MT ...