Paper: Age Prediction in Blogs: A Study of Style Content and Online Behavior in Pre- and Post-Social Media Generations

ACL ID P11-1077
Title Age Prediction in Blogs: A Study of Style Content and Online Behavior in Pre- and Post-Social Media Generations
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

We investigate whether wording, stylistic choices, and online behavior can be used to predict the age category of blog authors. Our hypothesis is that significant changes in writing style distinguish pre-social me- dia bloggers from post-social media blog- gers. Through experimentation with a range of years, we found that the birth dates of students in college at the time when social media such as AIM, SMS text messaging, MySpace and Facebook first became popular, enable accurate age pre- diction. We also show that internet writing characteristics are important features for age prediction, but that lexical content is also needed to produce significantly more accurate results. Our best results allow for 81.57% accuracy.