Paper: Boosting Statistical Word Alignment Using Labeled And Unlabeled Data

ACL ID P06-2117
Title Boosting Statistical Word Alignment Using Labeled And Unlabeled Data
Venue Annual Meeting of the Association of Computational Linguistics
Session Poster Session
Year 2006
Authors

This paper proposes a semi-supervised boosting approach to improve statistical word alignment with limited labeled data and large amounts of unlabeled data. The proposed approach modifies the super- vised boosting algorithm to a semi- supervised learning algorithm by incor- porating the unlabeled data. In this algo- rithm, we build a word aligner by using both the labeled data and the unlabeled data. Then we build a pseudo reference set for the unlabeled data, and calculate the error rate of each word aligner using only the labeled data. Based on this semi- supervised boosting algorithm, we inves- tigate two boosting methods for word alignment. In addition, we improve the word alignment results by combining the results of the two semi-supervised boost- ing methods. Experimental results on ...