Paper: A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities

ACL ID D13-1115
Title A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2013
Authors

Recent investigations into grounded models of language have shown that holistic views of language and perception can provide higher performance than independent views. In this work, we improve a two-dimensional multi- modal version of Latent Dirichlet Allocation (Andrews et al., 2009) in various ways. (1) We outperform text-only models in two different evaluations, and demonstrate that low-level visual features are directly compatible with the existing model. (2) We present a novel way to integrate visual features into the LDA model using unsupervised clusters of images. The clusters are directly interpretable and im- prove on our evaluation tasks. (3) We provide two novel ways to extend the bimodal mod- els to support three or more modalities. We find that the three-, four-, and five-dime...