Paper: Improved Information Structure Analysis of Scientific Documents Through Discourse and Lexical Constraints

ACL ID N13-1113
Title Improved Information Structure Analysis of Scientific Documents Through Discourse and Lexical Constraints
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2013
Authors

Inferring the information structure of scien- tific documents is useful for many down- stream applications. Existing feature-based machine learning approaches to this task re- quire substantial training data and suffer from limited performance. Our idea is to guide feature-based models with declarative domain knowledge encoded as posterior distribution constraints. We explore a rich set of discourse and lexical constraints which we incorporate through the Generalized Expectation (GE) cri- terion. Our constrained model improves the performance of existing fully and lightly su- pervised models. Even a fully unsupervised version of this model outperforms lightly su- pervised feature-based models, showing that our approach can be useful even when no la- beled data is available.