Paper: Biber Redux: Reconsidering Dimensions of Variation in American English

ACL ID C14-1054
Title Biber Redux: Reconsidering Dimensions of Variation in American English
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2014
Authors

Genre classification has been found to improve performance in many applications of statistical NLP, including language modeling for spoken language, domain adaptation of statistical parsers, and machine translation. It has also been found to benefit retrieval of spoken or written docu- ments. At its base, however, classification assumes separability. This paper revisits an assump- tion that genre variation is continuous along multiple dimensions, and an early use of principal component analysis to find these dimensions. Results on a very heterogeneous corpus of post- 1990s American English reveal four major dimensions, three of which echo those found in prior work and the fourth depending on features not used in the earlier study. The resulting model can provide a basis for more detailed a...