Paper: The Effect Of Corpus Size In Combining Supervised And Unsupervised Training For Disambiguation

ACL ID P06-2004
Title The Effect Of Corpus Size In Combining Supervised And Unsupervised Training For Disambiguation
Venue Annual Meeting of the Association of Computational Linguistics
Session Poster Session
Year 2006
Authors

We investigate the effect of corpus size in combining supervised and unsuper- vised learning for two types of attach- ment decisions: relative clause attach- ment and prepositional phrase attach- ment. The supervised component is Collins’ parser, trained on the Wall Street Journal. The unsupervised com- ponent gathers lexical statistics from an unannotated corpus of newswire text. We find that the combined sys- tem only improves the performance of the parser for small training sets. Sur- prisingly, the size of the unannotated corpus has little effect due to the noisi- ness of the lexical statistics acquired by unsupervised learning.