Paper: Automatically Constructing a Normalisation Dictionary for Microblogs

ACL ID D12-1039
Title Automatically Constructing a Normalisation Dictionary for Microblogs
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2012
Authors

Microblog normalisation methods often utilise complex models and struggle to differenti- ate between correctly-spelled unknown words and lexical variants of known words. In this paper, we propose a method for construct- ing a dictionary of lexical variants of known words that facilitates lexical normalisation via simple string substitution (e.g. tomorrow for tmrw). We use context information to generate possible variant and normalisation pairs and then rank these by string similarity. Highly- ranked pairs are selected to populate the dic- tionary. We show that a dictionary-based ap- proach achieves state-of-the-art performance for both F-score and word error rate on a stan- dard dataset. Compared with other methods, this approach offers a fast, lightweight and easy-to-use solution, and is ...