Paper: Extracting Salient Keywords From Instructional Videos Using Joint Text Audio And Visual Cues

ACL ID N06-2028
Title Extracting Salient Keywords From Instructional Videos Using Joint Text Audio And Visual Cues
Venue Human Language Technologies
Session Short Paper
Year 2006
Authors

This paper presents a multi-modal feature- based system for extracting salient keywords from transcripts of instructional videos. Specif- ically, we propose to extract domain-specific keywords for videos by integrating various cues from linguistic and statistical knowledge, as well as derived sound classes and charac- teristic visual content types. The acquisition of such salient keywords will facilitate video indexing and browsing, and significantly im- prove the quality of current video search en- gines. Experiments on four government in- structional videos show that 82% of the salient keywords appear in the top 50% of the highly ranked keywords. In addition, the audiovisual cues improve precision and recall by 1.1% and 1.5% respectively.