Paper: Stylometric Analysis of Scientific Articles

ACL ID N12-1033
Title Stylometric Analysis of Scientific Articles
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2012

We present an approach to automatically re- cover hidden attributes of scientific articles, such as whether the author is a native English speaker, whether the author is a male or a fe- male, and whether the paper was published in a conference or workshop proceedings. We train classifiers to predict these attributes in computational linguistics papers. The classi- fiers perform well in this challenging domain, identifying non-native writing with 95% accu- racy (over a baseline of 67%). We show the benefits of using syntactic features in stylom- etry; syntax leads to significant improvements over bag-of-words models on all three tasks, achieving 10% to 25% relative error reduction. We give a detailed analysis of which words and syntax most predict a particular attribute, and we show a stron...