Paper: Evaluating language understanding accuracy with respect to objective outcomes in a dialogue system

ACL ID E12-1048
Title Evaluating language understanding accuracy with respect to objective outcomes in a dialogue system
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 2012
Authors

It is not always clear how the differences in intrinsic evaluation metrics for a parser or classifier will affect the performance of the system that uses it. We investigate the relationship between the intrinsic evalua- tion scores of an interpretation component in a tutorial dialogue system and the learn- ing outcomes in an experiment with human users. Following the PARADISE method- ology, we use multiple linear regression to build predictive models of learning gain, an important objective outcome metric in tutorial dialogue. We show that standard intrinsic metrics such as F-score alone do not predict the outcomes well. However, we can build predictive performance func- tions that account for up to 50% of the vari- ance in learning gain by combining fea- tures based on standard evaluation...