Paper: Unsupervised Modeling of Twitter Conversations

ACL ID N10-1020
Title Unsupervised Modeling of Twitter Conversations
Venue Human Language Technologies
Session Main Conference
Year 2010
Authors

We propose the first unsupervised approach to the problem of modeling dialogue acts in an open domain. Trained on a corpus of noisy Twitter conversations, our method dis- covers dialogue acts by clustering raw utter- ances. Because it accounts for the sequential behaviour of these acts, the learned model can provide insight into the shape of communica- tion in a new medium. We address the chal- lenge of evaluating the emergent model with a qualitative visualization and an intrinsic con- versation ordering task. This work is inspired by a corpus of 1.3 million Twitter conversa- tions, which will be made publicly available. This huge amount of data, available only be- cause Twitter blurs the line between chatting and publishing, highlights the need to be able to adapt quickly to a new medium...