Paper: POS induction with distributional and morphological information using a distance-dependent Chinese restaurant process

ACL ID P14-2044
Title POS induction with distributional and morphological information using a distance-dependent Chinese restaurant process
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

We present a new approach to inducing the syntactic categories of words, combining their distributional and morphological prop- erties in a joint nonparametric Bayesian model based on the distance-dependent Chinese Restaurant Process. The prior distribution over word clusterings uses a log-linear model of morphological similar- ity; the likelihood function is the probabil- ity of generating vector word embeddings. The weights of the morphology model are learned jointly while inducing part-of- speech clusters, encouraging them to co- here with the distributional features. The resulting algorithm outperforms competi- tive alternatives on English POS induction.