Paper: Discriminating Gender on Twitter

ACL ID D11-1120
Title Discriminating Gender on Twitter
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2011

Accurate prediction of demographic attributes from social media and other informal online content is valuableformarketing,personalization,andlegalin- vestigation. This paper describes the construction of alarge, multilingualdatasetlabeledwithgender, and investigates statistical models for determining the gender of uncharacterized Twitter users. We explore several different classifier types on this dataset. We show the degree to which classifier accuracy varies based on tweet volumes as well as when various kinds of profile metadata are included in the models. We also perform a large-scale human assessment us- ing Amazon Mechanical Turk. Our methods signifi- cantly out-perform both baseline models and almost all humans on the same task.