Paper: Unsupervised Relation Extraction of In-Domain Data from Focused Crawls

ACL ID E14-3002
Title Unsupervised Relation Extraction of In-Domain Data from Focused Crawls
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

This thesis proposal approaches unsuper- vised relation extraction from web data, which is collected by crawling only those parts of the web that are from the same do- main as a relatively small reference cor- pus. The first part of this proposal is con- cerned with the efficient discovery of web documents for a particular domain and in a particular language. We create a com- bined, focused web crawling system that automatically collects relevant documents and minimizes the amount of irrelevant web content. The collected web data is semantically processed in order to acquire rich in-domain knowledge. Here, we focus on fully unsupervised relation extraction by employing the extended distributional hypothesis. We use distributional similar- ities between two pairs of nominals based on depend...