Paper: Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter

ACL ID N13-1121
Title Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2013
Authors

Hidden properties of social media users, such as their ethnicity, gender, and location, are of- ten reflected in their observed attributes, such as their first and last names. Furthermore, users who communicate with each other of- ten have similar hidden properties. We pro- pose an algorithm that exploits these insights to cluster the observed attributes of hundreds of millions of Twitter users. Attributes such as user names are grouped together if users with those names communicate with other similar users. We separately cluster millions of unique first names, last names, and user- provided locations. The efficacy of these clus- ters is then evaluated on a diverse set of clas- sification tasks that predict hidden users prop- erties such as ethnicity, geographic location, gender, language,...