Paper: Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure

ACL ID N12-1052
Title Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2012
Authors

It has been established that incorporating word cluster features derived from large unlabeled corpora can significantly improve prediction of linguistic structure. While previous work has focused primarily on English, we extend these results to other languages along two dimen- sions. First, we show that these results hold true for a number of languages across families. Second, and more interestingly, we provide an algorithm for inducing cross-lingual clusters and we show that features derived from these clusters significantly improve the accuracy of cross-lingual structure prediction. Specifically, we show that by augmenting direct-transfer sys- tems with cross-lingual cluster features, the rel- ative error of delexicalized dependency parsers, trained on English treebanks and transferred t...