Paper: Phrasal: A Statistical Machine Translation Toolkit for Exploring New Model Features

ACL ID N10-2003
Title Phrasal: A Statistical Machine Translation Toolkit for Exploring New Model Features
Venue Human Language Technologies
Session System Demonstration
Year 2010
Authors

We present a new Java-based open source toolkit for phrase-based machine translation. The key innovation provided by the toolkit is to use APIs for integrating new fea- tures (/knowledge sources) into the decod- ing model and for extracting feature statis- tics from aligned bitexts. The package in- cludes a number of useful features written to these APIs including features for hierarchi- cal reordering, discriminatively trained linear distortion, and syntax based language models. Other useful utilities packaged with the toolkit include: a conditional phrase extraction sys- tem that builds a phrase table just for a spe- cific dataset; and an implementation of MERT that allows for pluggable evaluation metrics for both training and evaluation with built in support for a variety of metrics (e....