Paper: User Demographics and Language in an Implicit Social Network

ACL ID D12-1135
Title User Demographics and Language in an Implicit Social Network
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2012
Authors

We consider the task of predicting the gender of the YouTube1 users and contrast two infor- mation sources: the comments they leave and the social environment induced from the af- filiation graph of users and videos. We prop- agate gender information through the videos and show that a user?s gender can be predicted from her social environment with the accuracy above 90%. We also show that the gender can be predicted from language alone (89%). A surprising result of our study is that the latter predictions correlate more strongly with the gender predominant in the user?s environment than with the sex of the person as reported in the profile. We also investigate how the two views (linguistic and social) can be combined and analyse how prediction accuracy changes over different age groups.