Paper: Mining and Modeling Relations between Formal and Informal Chinese Phrases from Web Corpora

ACL ID D08-1108
Title Mining and Modeling Relations between Formal and Informal Chinese Phrases from Web Corpora
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2008
Authors

We present a novel method for discovering and modeling the relationship between in- formal Chinese expressions (including collo- quialisms and instant-messaging slang) and their formal equivalents. Specifically, we pro- posed a bootstrapping procedure to identify a list of candidate informal phrases in web corpora. Given an informal phrase, we re- trieve contextual instances from the web us- ing a search engine, generate hypotheses of formal equivalents via this data, and rank the hypotheses using a conditional log-linear model. In the log-linear model, we incorpo- rate as feature functions both rule-based intu- itions and data co-occurrence phenomena (ei- ther as an explicit or indirect definition, or through formal/informal usages occurring in free variation in a discourse). We test our ...