Paper: Character-level Analysis of Semi-Structured Documents for Set Expansion

ACL ID D09-1156
Title Character-level Analysis of Semi-Structured Documents for Set Expansion
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2009
Authors

Set expansion refers to expanding a par- tial set of “seed” objects into a more com- plete set. One system that does set ex- pansion is SEAL (Set Expander for Any Language), which expands entities auto- matically by utilizing resources from the Web in a language-independent fashion. In this paper, we illustrated in detail the construction of character-level wrappers for set expansion implemented in SEAL. We also evaluated several kinds of wrap- pers for set expansion and showed that character-based wrappers perform better than HTML-based wrappers. In addition, we demonstrated a technique that extends SEAL to learn binary relational concepts (e.g., “x is the mayor of the city y”) from only two seeds. We also show that the extended SEAL has good performance on our evaluation datasets...