Paper: Semi-Supervised Training Of A Kernel PCA-Based Model For Word Sense Disambiguation

ACL ID C04-1190
Title Semi-Supervised Training Of A Kernel PCA-Based Model For Word Sense Disambiguation
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2004
Authors

In this paper, we introduce a new semi-supervised learning model for word sense disambiguation based on Kernel Prin- cipal Component Analysis (KPCA), with experiments showing that it can further improve accuracy over supervised KPCA models that have achieved WSD accuracy superior to the best published individual models. Although empirical results with supervised KPCA models demonstrate significantly better ac- curacy compared to the state-of-the-art achieved by either na¨ıve Bayes or maximum entropy models on Senseval-2 data, we identify specific sparse data conditions under which supervised KPCA models deteriorate to essentially a most-frequent-sense predictor. We discuss the potential of KPCA for leveraging unannotated data for partially-unsupervised training to address these issues, l...