Paper: Latent Semantic Matching: Application to Cross-language Text Categorization without Alignment Information

ACL ID P13-2038
Title Latent Semantic Matching: Application to Cross-language Text Categorization without Alignment Information
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2013
Authors

Unsupervised object matching (UOM) is a promising approach to cross-language natural language processing such as bilin- gual lexicon acquisition, parallel corpus construction, and cross-language text cat- egorization, because it does not require labor-intensive linguistic resources. How- ever, UOM only finds one-to-one corre- spondences from data sets with the same number of instances in source and target domains, and this prevents us from ap- plying UOM to real-world cross-language natural language processing tasks. To al- leviate these limitations, we proposes la- tent semantic matching, which embeds objects in both source and target lan- guage domains into a shared latent topic space. We demonstrate the effectiveness of our method on cross-language text cat- egorization. The results sho...