Paper: Ensemble Methods For Automatic Thesaurus Extraction

ACL ID W02-1029
Title Ensemble Methods For Automatic Thesaurus Extraction
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2002

Ensemble methods are state of the art for many NLP tasks. Recent work by Banko and Brill (2001) suggests that this would not necessarily be true if very large training corpora were available. However, their results are limited by the simplic- ity of their evaluation task and individual classi ers. Our work explores ensemble ef cacy for the more complex task of automatic the- saurus extraction on up to 300 million words. We examine our con icting results in terms of the constraints on, and com- plexity of, different contextual representa- tions, which contribute to the sparseness- and noise-induced bias behaviour of NLP systems on very large corpora.