ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | W13-3520 |
---|---|
Title | Polyglot: Distributed Word Representations for Multilingual NLP |
Venue | International Conference on Computational Natural Language Learning |
Session | Main Conference |
Year | 2013 |
Authors |
Distributed word representations (word embeddings) have recently contributed to competitive performance in language modeling and several NLP tasks. In this work, we train word embeddings for more than 100 languages using their cor- responding Wikipedias. We quantitatively demonstrate the utility of our word em- beddings by using them as the sole fea- tures for training a part of speech tagger for a subset of these languages. We find their performance to be competitive with near state-of-art methods in English, Dan- ish and Swedish. Moreover, we inves- tigate the semantic features captured by these embeddings through the proximity of word groupings. We will release these embeddings publicly to help researchers in the development and enhancement of mul- tilingual applications.