Paper: Group based Self Training for E-Commerce Product Record Linkage

ACL ID C14-1124
Title Group based Self Training for E-Commerce Product Record Linkage
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2014
Authors

In this paper, we study the task of product record linkage across multiple e-commerce web- sites. We solve this task via a semi-supervised approach and adopt the self-training algorithm for learning with little labeled data. In previous self-training algorithms, the learner tries to convert the most confidently predicted unlabeled examples of each class into labeled training examples. However, they evaluate the confidence of an instance only based on the individual evidence from the instance. The correlation among data instances is rarely considered. To address it, we develop a novel variant of the self-training algorithm by leveraging the data characteristics for the task of product record linkage. We joint consider a candidate linked pair and its corresponding correlated pairs as a group...