Paper: Improving Word Alignment Using Linguistic Code Switching Data

ACL ID E14-1001
Title Improving Word Alignment Using Linguistic Code Switching Data
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

Linguist Code Switching (LCS) is a situation where two or more languages show up in the context of a single conversation. For example, in English- Chinese code switching, there might be a sentence like ???15??k ?meeting (We will have a meeting in 15 minutes)?. Traditional machine translation (MT) systems treat LCS data as noise, or just as regular sentences. However, if LCS data is processed intelligently, it can provide a useful signal for training word alignment and MT models. Moreover, LCS data is from non-news sources which can enhance the diversity of training data for MT. In this paper, we first extract constraints from this code switching data and then incorporate them into a word alignment model training procedure. We also show that by using the code switching data, we can jointly...