Paper: Enhancing Authorship Attribution By Utilizing Syntax Tree Profiles

ACL ID E14-4038
Title Enhancing Authorship Attribution By Utilizing Syntax Tree Profiles
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

The aim of modern authorship attribution approaches is to analyze known authors and to assign authorships to previously un- seen and unlabeled text documents based on various features. In this paper we present a novel feature to enhance cur- rent attribution methods by analyzing the grammar of authors. To extract the fea- ture, a syntax tree of each sentence of a document is calculated, which is then split up into length-independent patterns using pq-grams. The mostly used pq-grams are then used to compose sample profiles of authors that are compared with the pro- file of the unlabeled document by utiliz- ing various distance metrics and similarity scores. An evaluation using three different and independent data sets reveals promis- ing results and indicate that the grammar of authors is a...