Paper: More Words and Bigger Pictures

Title More Words and Bigger Pictures
Venue Joint Conference on Lexical and Computational Semantics
Year 2013

Object recognition is a little like translation: a pic- ture (text in a source language) goes in, and a de- scription (text in a target language) comes out. I will use this analogy, which has proven fertile, to describe recent progress in object recognition. We have very good methods to spot some objects in images, but extending these methods to produce descriptions of images remains very difficult. The description might come in the form of a set of words, indicating objects, and boxes or regions spanned by the object. This representation is difficult to work with, because some objects seem to be much more important than others, and because objects interact. An alternative is a sentence or a paragraph describ- ing the picture, and recent work indicates how one might generate rich structure...