Paper: Corpus-Based Identification Of Non-Anaphoric Noun Phrases

ACL ID P99-1048
Title Corpus-Based Identification Of Non-Anaphoric Noun Phrases
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1999

Coreference resolution involves finding antecedents for anaphoric discourse entities, such as definite noun phrases. But many definite noun phrases are not anaphoric because their meaning can be un- derstood from general world knowledge (e.g. , "the White House" or "the news media"). We have developed a corpus-based algorithm for automat- ically identifying definite noun phrases that are non-anaphoric, which has the potential to improve the efficiency and accuracy of coreference resolu- tion systems. Our algorithm generates lists of non- anaphoric noun phrases and noun phrase patterns from a training corpus and uses them to recognize non-anaphoric noun phrases in new texts. Using 1600 MUC-4 terrorism news articles as the training corpus, our approach achieved 78% recall and 87% precision a...