Paper: Gender Inference of Twitter Users in Non-English Contexts

ACL ID D13-1114
Title Gender Inference of Twitter Users in Non-English Contexts
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2013
Authors

While much work has considered the problem of latent attribute inference for users of social media such as Twitter, little has been done on non-English-based content and users. Here, we conduct the first assessment of latent at- tribute inference in languages beyond English, focusing on gender inference. We find that the gender inference problem in quite diverse languages can be addressed using existing ma- chinery. Further, accuracy gains can be made by taking language-specific features into ac- count. We identify languages with complex orthography, such as Japanese, as difficult for existing methods, suggesting a valuable direc- tion for future research.