Paper: Bridging Languages through Etymology: The case of cross language text categorization

ACL ID P13-1064
Title Bridging Languages through Etymology: The case of cross language text categorization
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2013
Authors

We propose the hypothesis that word ety- mology is useful for NLP applications as a bridge between languages. We support this hypothesis with experiments in cross- language (English-Italian) document cat- egorization. In a straightforward bag-of- words experimental set-up we add etymo- logical ancestors of the words in the docu- ments, and investigate the performance of a model built on English data, on Italian test data (and viceversa). The results show not only statistically significant, but a large improvement ? a jump of almost 40 points in F1-score ? over the raw (vanilla bag-of- words) representation.