Paper: Improving Text Normalization via Unsupervised Model and Discriminative Reranking

ACL ID P14-3012
Title Improving Text Normalization via Unsupervised Model and Discriminative Reranking
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

Various models have been developed for normalizing informal text. In this paper, we propose two methods to improve nor- malization performance. First is an unsu- pervised approach that automatically iden- tifies pairs of a non-standard token and proper word from a large unlabeled cor- pus. We use semantic similarity based on continuous word vector representation, to- gether with other surface similarity mea- surement. Second we propose a reranking strategy to combine the results from differ- ent systems. This allows us to incorporate information that is hard to model in indi- vidual systems as well as consider multi- ple systems to generate a final rank for a test case. Both word- and sentence-level optimization schemes are explored in this study. We evaluate our approach on data sets used...