Paper: Modeling Latent Biographic Attributes in Conversational Genres

ACL ID P09-1080
Title Modeling Latent Biographic Attributes in Conversational Genres
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2009

This paper presents and evaluates several original techniques for the latent classifi- cation of biographic attributes such as gen- der, age and native language, in diverse genres (conversation transcripts, email) and languages (Arabic, English). First, we present a novel partner-sensitive model for extracting biographic attributes in con- versations, given the differences in lexi- cal usage and discourse style such as ob- served between same-gender and mixed- gender conversations. Then, we explore a rich variety of novel sociolinguistic and discourse-based features, including mean utterance length, passive/active usage, per- centage domination of the conversation, speaking rate and filler word usage. Cu- mulatively up to 20% error reduction is achieved relative to the standard Boulis and ...