Paper: Paraphrasing 4 Microblog Normalization

ACL ID D13-1008
Title Paraphrasing 4 Microblog Normalization
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2013

Compared to the edited genres that have played a central role in NLP research, mi- croblog texts use a more informal register with nonstandard lexical items, abbreviations, and free orthographic variation. When confronted with such input, conventional text analysis tools often perform poorly. Normalization ? replacing orthographically or lexically id- iosyncratic forms with more standard variants ? can improve performance. We propose a method for learning normalization rules from machine translations of a parallel corpus of microblog messages. To validate the utility of our approach, we evaluate extrinsically, show- ing that normalizing English tweets and then translating improves translation quality (com- pared to translating unnormalized text) using three standard web translation service...