Paper: Chinese CCGbank: extracting CCG derivations from the Penn Chinese Treebank

ACL ID C10-1122
Title Chinese CCGbank: extracting CCG derivations from the Penn Chinese Treebank
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2010
Authors

Automated conversion has allowed the de- velopment of wide-coverage corpora for a variety of grammar formalisms without the expense of manual annotation. Analysing new languages also tests formalisms, ex- posing their strengths and weaknesses. We present Chinese CCGbank, a 760,000 word corpus annotated with Combinatory Categorial Grammar ( CCG) derivations, in- duced automatically from the Penn Chi- nese Treebank ( PCTB). We design parsimo- nious CCG analyses for a range of Chinese syntactic constructions, and transform the PCTB trees to produce them. Our process yields a corpus of 27,759 derivations, cov- ering 98.1% of the PCTB.