Putting a Value on Comparable Data

Title Putting a Value on Comparable Data
2011

Machine translation began in 1947 with an influential memo by Warren Weaver. In that memo, Weaver noted that human code-breakers could transform ciphers into natural language (e.g., into Turkish)  without access to parallel ciphertext/plaintext data, and  without knowing the plaintext language’s syntax and semantics. Simple word- and letter-statistics seemed to be enough for the task. Weaver then predicted that such statistical methods could also solve a tougher problem, namely language translation. This raises the question: can sufficient translation knowledge be derived from comparable (non-parallel) data? In this talk, I will discuss initial work in treating foreign language as a code for English, where we assume the code to involve both word subst...