Paper: Automatic evaluation of spoken summaries: the case of language assessment

ACL ID W14-1809
Title Automatic evaluation of spoken summaries: the case of language assessment
Venue Innovative Use of NLP for Building Educational Applications
Session
Year 2014
Authors

This paper investigates whether ROUGE, a popular metric for the evaluation of au- tomated written summaries, can be ap- plied to the assessment of spoken sum- maries produced by non-native speakers of English. We demonstrate that ROUGE, with its emphasis on the recall of infor- mation, is particularly suited to the as- sessment of the summarization quality of non-native speakers? responses. A stan- dard baseline implementation of ROUGE- 1 computed over the output of the au- tomated speech recognizer has a Spear- man correlation of ? = 0.55 with experts? scores of speakers? proficiency (? = 0.51 for a content-vector baseline). Further in- creases in agreement with experts? scores can be achieved by using types instead of tokens for the computation of word fre- quencies for both candidate an...