Paper: Using Out-of-Domain Data for Lexical Addressee Detection in Human-Human-Computer Dialog

ACL ID N13-1022
Title Using Out-of-Domain Data for Lexical Addressee Detection in Human-Human-Computer Dialog
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2013
Authors

Addressee detection (AD) is an important problem for dialog systems in human-human- computer scenarios (contexts involving mul- tiple people and a system) because system- directed speech must be distinguished from human-directed speech. Recent work on AD (Shriberg et al., 2012) showed good results using prosodic and lexical features trained on in-domain data. In-domain data, however, is expensive to collect for each new domain. In this study we focus on lexical models and in- vestigate how well out-of-domain data (either outside the domain, or from single-user sce- narios) can fill in for matched in-domain data. We find that human-addressed speech can be modeled using out-of-domain conversational speech transcripts, and that human-computer utterances can be modeled using single-user data: ...