ACL ID W04-2602
Venue Computational Lexical Semantics Workshop
Year 2004

We describe work in progress aimed at devel- oping methods for automatically constructing a lexicon using only statistical data derived from analysis of corpora, a problem we call lexical optimization. Speci cally, we use statistical methods alone to obtain information equivalent to syntactic categories, and to discover the se- mantically meaningful units of text, which may be multi-word units or polysemous terms-in- context. Our guiding principle is to employ a notion of meaningfulness that can be quanti- ed information-theoretically, so that plausible variants of a lexicon can be judged relative to each other. We describe a technique of this na- ture called information theoretic co-clustering and give results of a series of experiments built around it that demonstrate the main ingredi- e...