Paper: A Stochastic Language Model using Dependency and its Improvement by Word Clustering

ACL ID P98-2148
Title A Stochastic Language Model using Dependency and its Improvement by Word Clustering
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1998
Authors

In this paper, we present a stochastic language model for Japanese using dependency. The predic- tion unit in this model is all attribute of "bunsetsu". This is represented by the product of the head of con- tent words and that of function words. The relation between the attributes of "bunsetsu" is ruled by a context-free grammar. The word sequences axe pre- dicted from the attribute using word n-gram model. The spell of Unknow word is predicted using charac- ter n-grain model. This model is robust in that it can compute the probability of an arbitrary string and is complete in that it models from unknown word to dependency at the same time.