Paper: Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings

ACL ID D12-1099
Title Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2012
Authors

Short listings such as classified ads or product listings abound on the web. If a computer can reliably extract information from them, it will greatly benefit a variety of applications. Short listings are, however, challenging to process due to their informal styles. In this paper, we present an unsupervised information extrac- tion system for short listings. Given a cor- pus of listings, the system builds a seman- tic model that represents typical objects and their attributes in the domain of the corpus, and then uses the model to extract informa- tion. Two key features in the system are a se- mantic parser that extracts objects and their at- tributes and a listing-focused clustering mod- ule that helps group together extracted tokens of same type. Our evaluation shows that the semantic m...