Paper: A Model Of Lexical Attraction And Repulsion

ACL ID P97-1048
Title A Model Of Lexical Attraction And Repulsion
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1997

This paper introduces new methods based on exponential families for modeling the correlations between words in text and speech. While previous work assumed the effects of word co-occurrence statistics to be constant over a window of several hun- dred words, we show that their influence is nonstationary on a much smaller time scale. Empirical data drawn from En- glish and Japanese text, as well as conver- sational speech, reveals that the "attrac- tion" between words decays exponentially, while stylistic and syntactic contraints cre- ate a "repulsion" between words that dis- courages close co-occurrence. We show that these characteristics are well described by simple mixture models based on two- stage exponential distributions which can be trained using the EM algorithm. The resulting dista...