Paper: Exploring Content Features for Automated Speech Scoring

ACL ID N12-1011
Title Exploring Content Features for Automated Speech Scoring
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2012

Most previous research on automated speech scoring has focused on restricted, predictable speech. For automated scoring of unrestricted spontaneous speech, speech proficiency has been evaluated primarily on aspects of pro- nunciation, fluency, vocabulary and language usage but not on aspects of content and topi- cality. In this paper, we explore features repre- senting the accuracy of the content of a spoken response. Content features are generated us- ing three similarity measures, including a lex- ical matching method (Vector Space Model) and two semantic similarity measures (Latent Semantic Analysis and Pointwise Mutual In- formation). All of the features exhibit moder- ately high correlations with human proficiency scores on human speech transcriptions. The correlations decrease somewh...