Paper: ECNU: Leveraging on Ensemble of Heterogeneous Features and Information Enrichment for Cross Level Semantic Similarity Estimation

ACL ID S14-2043
Title ECNU: Leveraging on Ensemble of Heterogeneous Features and Information Enrichment for Cross Level Semantic Similarity Estimation
Venue Joint Conference on Lexical and Computational Semantics
Session
Year 2014
Authors

This paper reports our submissions to the Cross Level Semantic Similarity (CLSS) task in SemEval 2014. We submitted one Random Forest regression system on each cross level text pair, i.e., Paragraph to Sentence (P-S), Sentence to Phrase (S- Ph), Phrase to Word (Ph-W) and Word to Sense (W-Se). For text pairs on P-S level and S-Ph level, we consider them as sentences and extract heterogeneous types of similarity features, i.e., string features, knowledge based features, corpus based features, syntactic features, machine trans- lation based features, multi-level text fea- tures, etc. For text pairs on Ph-W level and W-Se level, due to lack of informa- tion, most of these features are not ap- plicable or available. To overcome this problem, we propose several information enrichment methods usi...