Paper: What to do about bad language on the internet

ACL ID N13-1037
Title What to do about bad language on the internet
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2013

The rise of social media has brought compu- tational linguistics in ever-closer contact with bad language: text that defies our expecta- tions about vocabulary, spelling, and syntax. This paper surveys the landscape of bad lan- guage, and offers a critical review of the NLP community?s response, which has largely fol- lowed two paths: normalization and domain adaptation. Each approach is evaluated in the context of theoretical and empirical work on computer-mediated communication. In addi- tion, the paper presents a quantitative analy- sis of the lexical diversity of social media text, and its relationship to other corpora.