Paper: Learning a Syntagmatic and Paradigmatic Structure from Language Data with a Bi-Multigram Model

ACL ID P98-1047
Title Learning a Syntagmatic and Paradigmatic Structure from Language Data with a Bi-Multigram Model
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1998
Authors

In this paper, we present a stochastic language mod- eling tool which aims at retrieving variable-length phrases (multigrams), assuming bigram dependen- cies between them. The phrase retrieval can be in- termixed with a phrase clustering procedure, so that the language data are iteratively structured at both a paradigmatic and a syntagmatic level in a fully in- tegrated way. Perplexity results on ATR travel ar- rangement data with a bi-multigram model (assum- ing bigram correlations between the phrases) come very close to the trigram scores with a reduced num- ber of entries in the language model. Also the ability of the class version of the model to merge semanti- cally related phrases into a common class is illus- trated.