Paper: Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics

ACL ID D14-1005
Title Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2014
Authors

We construct multi-modal concept repre- sentations by concatenating a skip-gram linguistic representation vector with a vi- sual concept representation vector com- puted using the feature extraction layers of a deep convolutional neural network (CNN) trained on a large labeled object recognition dataset. This transfer learn- ing approach brings a clear performance gain over features based on the traditional bag-of-visual-word approach. Experimen- tal results are reported on the WordSim353 and MEN semantic relatedness evaluation tasks. We use visual features computed us- ing either ImageNet or ESP Game images.