Paper: Assessing the Readability of Sentences: Which Corpora and Features?

ACL ID W14-1820
Title Assessing the Readability of Sentences: Which Corpora and Features?
Venue Innovative Use of NLP for Building Educational Applications
Session
Year 2014
Authors

The paper investigates the problem of sentence readability assessment, which is modelled as a classification task, with a specific view to text simplification. In par- ticular, it addresses two open issues con- nected with it, i.e. the corpora to be used for training, and the identification of the most effective features to determine sen- tence readability. An existing readabil- ity assessment tool developed for Italian was specialized at the level of training cor- pus and learning algorithm. A maximum entropy?based feature selection and rank- ing algorithm (grafting) was used to iden- tify to the most relevant features: it turned out that assessing the readability of sen- tences is a complex task, requiring a high number of features, mainly syntactic ones.