Paper: Author Verification Using Common N-Gram Profiles of Text Documents

ACL ID C14-1038
Title Author Verification Using Common N-Gram Profiles of Text Documents
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2014
Authors

Authorship verification is the problem of answering the question whether or not a sample text document was written by a specific person, given a few other documents known to be authored by them. We propose a proximity based method for one-class classification that applies the Common N-Gram (CNG) dissimilarity measure. The CNG dissimilarity (Ke?selj et al., 2003) is based on the differences in the frequencies of n-grams of tokens (characters, words) that are most common in the considered documents. Our method utilizes the pairs of most dissimilar documents among documents of known authorship. We evaluate various variants of the method in the setting of a single classifier or an ensemble of classifiers, on a multilingual authorship verification corpus of the PAN 2013 Author Identification ev...