Paper: Cross-Topic Authorship Attribution: Will Out-Of-Topic Data Help?

ACL ID C14-1116
Title Cross-Topic Authorship Attribution: Will Out-Of-Topic Data Help?
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2014
Authors

Most previous research on authorship attribution (AA) assumes that the training and test data are drawn from same distribution. But in real scenarios, this assumption is too strong. The goal of this study is to improve the prediction results in cross-topic AA (CTAA), where the training data comes from one topic but the test data comes from another. Our proposed idea is to build a predictive model for one topic using documents from all other available topics. In addition to improving the performance of CTAA, we also make a thorough analysis of the sensitivity to changes in topic of four most commonly used feature types in AA. We empirically illustrate that our proposed framework is significantly better than the one trained on a single out-of-domain topic and is as effective, in some cases, ...