Paper: Applying Extrasentential Context To Maximum Entropy Based Tagging With A Large Semantic And Syntactic Tagset

ACL ID W99-0607
Title Applying Extrasentential Context To Maximum Entropy Based Tagging With A Large Semantic And Syntactic Tagset
Venue 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora
Session Main Conference
Year 1999
Authors

Experiments are presented which measure the perplexity reduction derived from incor- porating into the predictive model utilised in a standard tag-n-gram part-of-speech tagger, contextual information from previous sentences of a document. The tagset employed is the roughly-3000-tag ATR General English Tagset, whose tags are both syntactic and semantic in nature. The kind of extrasentential informa- tion provided to the tagger is semantic, and consists in the occurrence or non-occurrence, within the past 6 sentences of the document being tagged, of words tagged with particular tags from the tagset, and of boolean combina- tions of such conditions. In some cases, these conditions are combined with the requirement that the word being tagged belong to a partic- ular set of words thought most l...