Paper: Biases in Predicting the Human Language Model

ACL ID P14-2002
Title Biases in Predicting the Human Language Model
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

We consider the prediction of three hu- man behavioral measures ? lexical deci- sion, word naming, and picture naming ? through the lens of domain bias in lan- guage modeling. Contrasting the predic- tive ability of statistics derived from 6 dif- ferent corpora, we find intuitive results showing that, e.g., a British corpus over- predicts the speed with which an Amer- ican will react to the words ward and duke, and that the Google n-grams over- predicts familiarity with technology terms. This study aims to provoke increased con- sideration of the human language model by NLP practitioners: biases are not lim- ited to differences between corpora (i.e. ?train? vs. ?test?); they can exist as well between corpora and the intended user of the resultant technology.