Paper: A Random Text Model for the Generation of Statistical Language Invariants

ACL ID N07-1014
Title A Random Text Model for the Generation of Statistical Language Invariants
Venue Human Language Technologies
Session Main Conference
Year 2007
Authors

A novel random text generation model is introduced. Unlike in previous random text models, that mainly aim at producing a Zipfian distribution of word frequencies, our model also takes the properties of neighboring co-occurrence into account and introduces the notion of sentences in random text. After pointing out the defi- ciencies of related models, we provide a generation process that takes neither the Zipfian distribution on word frequencies nor the small-world structure of the neighboring co-occurrence graph as a constraint. Nevertheless, these distribu- tions emerge in the process. The distribu- tions obtained with the random generation model are compared to a sample of natu- ral language data, showing high agree- ment also on word length and sentence length. This work proposes a pla...