Paper: Duke's Trainable Information And Meaning Extraction System (Duke TIMES)

ACL ID A97-2004
Title Duke's Trainable Information And Meaning Extraction System (Duke TIMES)
Venue Applied Natural Language Processing Conference
Session Main Conference
Year 1997
Authors

2.1 Tools Used By the System In addition to WordNet, the system uses IBM's LanguageWare English Dictionary, IBM's Computing Terms Dictionary, and a local dictionary of our choice. The system also uses a gazetteer consisting of approximately 250 names of cities, states, and countries. 2.2 The Tokenizer, the Preprocessor, and the Partial Parser The Tokenizer accepts ASCII characters as input and produces a stream of tokens (words) as output. It also determines sentence boundaries. The preprocessor tries to identify some important entities like names of companies, proper names, etc. contained in the article. Groups of words that comprise these entities are collected together and con7 sidered as one item for all future processing. The Partial Parser produces a sequence of nonoverlapping phrase...