Paper: Putting a Value on Comparable Data

ACL ID W11-1201
Title Putting a Value on Comparable Data
Venue Building and Using Comparable Corpora
Year 2011

Machine translation began in 1947 with an influential memo by Warren Weaver. In that memo, Weaver noted that human code-breakers could transform ciphers into natural language (e.g., into Turkish)  without access to parallel ciphertext/plaintext data, and  without knowing the plaintext language’s syntax and semantics. Simple word- and letter-statistics seemed to be enough for the task. Weaver then predicted that such statistical methods could also solve a tougher problem, namely language translation. This raises the question: can sufficient translation knowledge be derived from comparable (non-parallel) data? In this talk, I will discuss initial work in treating foreign language as a code for English, where we assume the code to involve both word subst...