Paper: The chunk as the period of the functions length and frequency of words on the syntagmatic axis

ACL ID W09-3830
Title The chunk as the period of the functions length and frequency of words on the syntagmatic axis
Venue International Conference on Parsing Technologies
Session Main Conference
Year 2009
Authors

Chunking is segmenting a text into chunks, sub-sentential segments, that Abney ap- proximately defined as stres groups. Chunk- ing usualy uses monolingual resources, most often exhaustive, sometimes partial : function words and punctuations, which often mark beginings and ends of chunks. But, to ex- tend this method to other languages, mono- lingual resources have to be multiplied. We present a new method : endogenous chunk- ing, which uses no other resource than the text to be segmented itself. The idea of this method comes from Zipf : to make the least comunication efort, speakers are driven to shorten frequent words. A chunk then can be characterized as the period of the periodic corelated functions length and frequency of words on the syntagmatic axis. This original metho...