Paper: A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books

ACL ID S13-1035
Title A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books
Venue Joint Conference on Lexical and Computational Semantics
Session
Year 2013
Authors

We created a dataset of syntactic-ngrams (counted dependency-tree fragments) based on a corpus of 3.5 million English books. The dataset includes over 10 billion distinct items covering a wide range of syntactic configura- tions. It also includes temporal information, facilitating new kinds of research into lexical semantics over time. This paper describes the dataset, the syntactic representation, and the kinds of information provided.