Paper: A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

ACL ID D10-1039
Title A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2010
Authors

Several recent discourse parsers have em- ployed fully-supervised machine learning ap- proaches. These methods require human an- notators to beforehand create an extensive training corpus, which is a time-consuming and costly process. On the other hand, un- labeled data is abundant and cheap to col- lect. In this paper, we propose a novel semi-supervised method for discourse rela- tion classification based on the analysis of co- occurring features in unlabeled data, which is then taken into account for extending the fea- ture vectors given to a classifier. Our exper- imental results on the RST Discourse Tree- bank corpus and Penn Discourse Treebank in- dicate that the proposed method brings a sig- nificant improvement in classification accu- racy and macro-average F-score when small traini...