Paper: Combining Visual and Textual Features for Information Extraction from Online Flyers

ACL ID D14-1206
Title Combining Visual and Textual Features for Information Extraction from Online Flyers
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2014
Authors

Information in visually rich formats such as PDF and HTML is often conveyed by a combination of textual and visual features. In particular, genres such as marketing flyers and info-graphics often augment textual information by its color, size, positioning, etc. As a result, tradi- tional text-based approaches to informa- tion extraction (IE) could underperform. In this study, we present a supervised ma- chine learning approach to IE from on- line commercial real estate flyers. We evaluated the performance of SVM clas- sifiers on the task of identifying 12 types of named entities using a combination of textual and visual features. Results show that the addition of visual features such as color, size, and positioning significantly increased classifier performance.