Paper: A Graph-based Approach for Contextual Text Normalization

ACL ID D14-1037
Title A Graph-based Approach for Contextual Text Normalization
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2014
Authors

The informal nature of social media text renders it very difficult to be automati- cally processed by natural language pro- cessing tools. Text normalization, which corresponds to restoring the non-standard words to their canonical forms, provides a solution to this challenge. We introduce an unsupervised text normalization approach that utilizes not only lexical, but also con- textual and grammatical features of social text. The contextual and grammatical fea- tures are extracted from a word association graph built by using a large unlabeled so- cial media text corpus. The graph encodes the relative positions of the words with re- spect to each other, as well as their part-of- speech tags. The lexical features are ob- tained by using the longest common sub- sequence ratio and edit distanc...