Paper: Scaling Context Space

ACL ID P02-1030
Title Scaling Context Space
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2002

Context is used in many NLP systems as an indicator of a term’s syntactic and se- mantic function. The accuracy of the sys- tem is dependent on the quality and quan- tity of contextual information available to describe each term. However, the quan- tity variable is no longer xed by lim- ited corpus resources. Given xed train- ing time and computational resources, it makes sense for systems to invest time in extracting high quality contextual in- formation from a xed corpus. However, with an effectively limitless quantity of text available, extraction rate and repre- sentation size need to be considered. We use thesaurus extraction with a range of context extracting tools to demonstrate the interaction between context quantity, time and size on a corpus of 300 million words.