Paper: Partial Training For A Lexicalized-Grammar Parser

ACL ID N06-1019
Title Partial Training For A Lexicalized-Grammar Parser
Venue Human Language Technologies
Session Main Conference
Year 2006

We propose a solution to the annotation bottleneck for statistical parsing, by ex- ploiting the lexicalized nature of Combi- natory Categorial Grammar (CCG). The parsing model uses predicate-argument dependencies for training, which are de- rived from sequences of CCG lexical cate- gories rather than full derivations. A sim- ple method is used for extracting depen- dencies from lexical category sequences, resulting in high precision, yet incomplete and noisy data. The dependency parsing model of Clark and Curran (2004b) is ex- tended to exploit this partial training data. Remarkably, the accuracy of the parser trained on data derived from category se- quences alone is only 1.3% worse in terms of F-score than the parser trained on com- plete dependency structures.