Paper: Short Text Authorship Attribution Via Sequence Kernels Markov Chains And Author Unmasking: An Investigation

ACL ID W06-1657
Title Short Text Authorship Attribution Via Sequence Kernels Markov Chains And Author Unmasking: An Investigation
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2006
Authors

We present an investigation of recently proposed character and word sequence kernels for the task of authorship attribu- tion based on relatively short texts. Per- formance is compared with two corre- sponding probabilistic approaches based on Markov chains. Several configurations of the sequence kernels are studied on a relatively large dataset (50 authors), where each author covered several topics. Utilis- ing Moffat smoothing, the two probabilis- ticapproachesobtainsimilarperformance, whichinturniscomparabletothatofchar- acter sequence kernels and is better than that of word sequence kernels. The results further suggest that when using a realistic setup that takes into account the case of texts which are not written by any hypoth- esised authors, the amount of training ma- terialhasmore...