Paper: Improving Multi-Modal Representations Using Image Dispersion: Why Less is Sometimes More

ACL ID P14-2135
Title Improving Multi-Modal Representations Using Image Dispersion: Why Less is Sometimes More
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

Models that learn semantic representations from both linguistic and perceptual in- put outperform text-only models in many contexts and better reflect human concept acquisition. However, experiments sug- gest that while the inclusion of perceptual input improves representations of certain concepts, it degrades the representations of others. We propose an unsupervised method to determine whether to include perceptual input for a concept, and show that it significantly improves the ability of multi-modal models to learn and represent word meanings. The method relies solely on image data, and can be applied to a va- riety of other NLP tasks.