Paper: Corpus Representativeness For Syntactic Information Acquisition

ACL ID P04-3012
Title Corpus Representativeness For Syntactic Information Acquisition
Venue Annual Meeting of the Association of Computational Linguistics
Session System Demonstration
Year 2004
Authors
  • Núria Bel (Pompeu Fabra University, Barcelona Spain)

This paper refers to part of our research in the area of automatic acquisition of computational lexicon information from corpus. The present paper reports the ongoing research on corpus representativeness. For the task of inducing information out of text, we wanted to fix a certain degree of confidence on the size and composition of the collection of documents to be observed. The results show that it is possible to work with a relatively small corpus of texts if it is tuned to a particular domain. Even more, it seems that a small tuned corpus will be more informative for real parsing than a general corpus.