Paper: Adapting a WSJ-Trained Parser to Grammatically Noisy Text

ACL ID P08-2056
Title Adapting a WSJ-Trained Parser to Grammatically Noisy Text
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2008
Authors

We present a robust parser which is trained on a treebank of ungrammatical sentences. The treebank is created automatically by modify- ing Penn treebank sentences so that they con- tain one or more syntactic errors. We eval- uate an existing Penn-treebank-trained parser on the ungrammatical treebank to see how it reacts to noise in the form of grammatical er- rors. We re-train this parser on the training section of the ungrammatical treebank, lead- ing to an significantly improved performance on the ungrammatical test sets. We show how a classifier can be used to prevent performance degradation on the original grammatical data.