Paper: A Hybrid Rule/Model-Based Finite-State Framework for Normalizing SMS Messages

ACL ID P10-1079
Title A Hybrid Rule/Model-Based Finite-State Framework for Normalizing SMS Messages
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2010
Authors

In recent years, research in natural language processing has increasingly focused on normalizing SMS messages. Different well-defined approaches have been proposed, but the problem remains far from being solved: best systems achieve a 11% Word Error Rate. This paper presents a method that shares similarities with both spell checking and machine translation approaches. The normalization part of the system is entirely based on models trained from a corpus. Evaluated in French by 10-fold-cross validation, the system achieves a 9.3% Word Error Rate and a 0.83 BLEU score.