Paper: Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space

ACL ID D14-1113
Title Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2014
Authors

There is rising interest in vector-space word embeddings and their use in NLP, especially given recent methods for their fast estimation at very large scale. Nearly all this work, however, assumes a sin- gle vector per word type?ignoring poly- semy and thus jeopardizing their useful- ness for downstream tasks. We present an extension to the Skip-gram model that efficiently learns multiple embeddings per word type. It differs from recent related work by jointly performing word sense discrimination and embedding learning, by non-parametrically estimating the num- ber of senses per word type, and by its ef- ficiency and scalability. We present new state-of-the-art results in the word similar- ity in context task and demonstrate its scal- ability by training with one machine on a corpus of nea...