Paper: Applying Co-Training Methods To Statistical Parsing

ACL ID N01-1023
Title Applying Co-Training Methods To Statistical Parsing
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2001
  • Anoop Sarkar (University of Pennsylvania, Philadelphia PA)

We propose a novel Co-Training method for statistical parsing. The algorithm takes as input a small corpus (9695 sentences) annotated with parse trees, a dictionary of possible lexicalized structures for each word in the training set and a large pool of unlabeled text. The algo- rithm iteratively labels the entire data set with parse trees. Using empirical results based on parsing the Wall Street Journal corpus we show that training a statistical parser on the combined labeled and unlabeled data strongly out- performs training only on the labeled data.