Paper: Biases in Predicting the Human Language Model

ACL ID P14-2002
Title Biases in Predicting the Human Language Model
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014

We consider the prediction of three hu- man behavioral measures ? lexical deci- sion, word naming, and picture naming ? through the lens of domain bias in lan- guage modeling. Contrasting the predic- tive ability of statistics derived from 6 dif- ferent corpora, we find intuitive results showing that, e.g., a British corpus over- predicts the speed with which an Amer- ican will react to the words ward and duke, and that the Google n-grams over- predicts familiarity with technology terms. This study aims to provoke increased con- sideration of the human language model by NLP practitioners: biases are not lim- ited to differences between corpora (i.e. ?train? vs. ?test?); they can exist as well between corpora and the intended user of the resultant technology.