Paper: Modelling Lexical Redundancy For Machine Translation

ACL ID P06-1122
Title Modelling Lexical Redundancy For Machine Translation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2006
Authors

Certain distinctions made in the lexicon of one language may be redundant when translating into another language. We quantify redundancy among source types by the similarity of their distributions over target types. We propose a language- independent framework for minimising lexical redundancy that can be optimised directly from parallel text. Optimisation of the source lexicon for a given target lan- guage is viewed as model selection over a set of cluster-based translation models. Redundantdistinctionsbetweentypesmay exhibit monolingual regularities, for ex- ample, inflexion patterns. We define a prior over model structure using a Markov random field and learn features over sets of monolingual types that are predictive of bilingual redundancy. The prior makes model selection more robust ...