Paper: Named Entity Recognition in Tweets: An Experimental Study

ACL ID D11-1141
Title Named Entity Recognition in Tweets: An Experimental Study
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2011

People tweet more than 100 Million times daily, yielding a noisy, informal, but some- times informative corpus of 140-character messages that mirrors the zeitgeist in an un- precedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition. Our novel T-NER system doubles F1 score compared with the Stanford NER system. T-NER leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms co- training, increasing F1 by 25% over ten com- mon entity types. Our NLP tools are available at: http://