Paper: A discriminative language model with pseudo-negative samples

ACL ID P07-1010
Title A discriminative language model with pseudo-negative samples
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2007
  • Daisuke Okanohara (University of Tokyo, Tokyo Japan)
  • Jun'ichi Tsujii (University of Tokyo, Tokyo Japan; University of Manchester, Manchester UK; National Center for Text Mining, UK)

In this paper, we propose a novel discrim- inative language model, which can be ap- plied quite generally. Compared to the well known N-gram language models, dis- criminative language models can achieve more accurate discrimination because they can employ overlapping features and non- local information. However, discriminative language models have been used only for re-ranking in specific applications because negative examples are not available. We propose sampling pseudo-negative examples taken from probabilistic language models. However, this approach requires prohibitive computational cost if we are dealing with quite a few features and training samples. We tackle the problem by estimating the la- tent information in sentences using a semi- Markov class model, and then extracting featur...