Paper: Evaluating CETEMPublico A Free Resource For Portuguese

ACL ID P01-1058
Title Evaluating CETEMPublico A Free Resource For Portuguese
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2001

In this paper we present a thorough evaluation of a corpus resource for Portuguese, CETEMPúblico, a 180- million word newspaper corpus free for R&D in Portuguese processing. We provide information that should be useful to those using the resource, and to considerable improvement for later versions. In addition, we think that the procedures presented can be of interest for the larger NLP community, since corpus evaluation and description is unfortunately not a common exercise. G20G3 G44G81G87G85G82G71G88G70G87G76G82G81 CETEMPúblico is a large corpus of European Portuguese newspaper language, available at no cost to the community dealing with the processing of Portuguese. 1 It was created in the framework of the Computational Processing of Portuguese project, a government funded initiative...