Paper: A Novel Discourse Parser Based on Support Vector Machine Classification

ACL ID P09-1075
Title A Novel Discourse Parser Based on Support Vector Machine Classification
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2009
Authors
  • David duVerle (National Institute of Informatics (NII), Tokyo Japan; University of Paris 6, Paris France)
  • Helmut Prendinger (National Institute of Informatics (NII), Tokyo Japan)

This paper introduces a new algorithm to parse discourse within the framework of Rhetorical Structure Theory (RST). Our method is based on recent advances in the field of statistical machine learning (mul- tivariate capabilities of Support Vector Machines) and a rich feature space. RST offers a formal framework for hierarchical text organization with strong applications in discourse analysis and text generation. We demonstrate automated annotation of a text with RST hierarchically organised relations, with results comparable to those achieved by specially trained human anno- tators. Using a rich set of shallow lexical, syntactic and structural features from the input text, our parser achieves, in linear time, 73.9% of professional annotators’ human agreement F-score. The parser is 5% to ...