Paper: Tri-Training for Authorship Attribution with Limited Training Data

ACL ID P14-2057
Title Tri-Training for Authorship Attribution with Limited Training Data
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

Authorship attribution (AA) aims to identify the authors of a set of documents. Traditional studies in this area often assume that there are a large set of labeled documents available for training. However, in the real life, it is often difficult or expensive to collect a large set of labeled data. For example, in the online review domain, most reviewers (authors) only write a few reviews, which are not enough to serve as the training data for accurate classification. In this paper, we present a novel three-view tri- training method to iteratively identify authors of unlabeled data to augment the training set. The key idea is to first represent each docu- ment in three distinct views, and then perform tri-training to exploit the large amount of un- labeled documents. Starting ...