Paper: A Language Modeling Approach To Predicting Reading Difficulty

ACL ID N04-1025
Title A Language Modeling Approach To Predicting Reading Difficulty
Venue Human Language Technologies
Session Main Conference
Year 2004
Authors

We demonstrate a new research approach to the problem of predicting the reading difficulty of a text passage, by recasting readability in terms of statistical language modeling. We derive a measure based on an extension of multinomial naïve Bayes classification that combines multiple language models to estimate the most likely grade level for a given passage. The resulting classifier is not spe- cific to any particular subject and can be trained with relatively little labeled data. We perform pre- dictions for individual Web pages in English and compare our performance to widely-used semantic variables from traditional readability measures. We show that with minimal changes, the classifier may be retrained for use with French Web documents. For both English and French, the classifier main...