Paper: Linguistic Redundancy in Twitter

ACL ID D11-1061
Title Linguistic Redundancy in Twitter
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2011

In the last few years, the interest of the re- search community in micro-blogs and social media services, such as Twitter, is growing ex- ponentially. Yet, so far not much attention has been paid on a key characteristic of micro- blogs: the high level of information redun- dancy. The aim of this paper is to systemat- ically approach this problem by providing an operational definition of redundancy. We cast redundancy in the framework of Textual En- tailment Recognition. We also provide quan- titative evidence on the pervasiveness of re- dundancy in Twitter, and describe a dataset of redundancy-annotated tweets. Finally, we present a general purpose system for identify- ing redundant tweets. An extensive quantita- tive evaluation shows that our system success- fully solves the redundancy de...