ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | M93-1024 |
---|---|
Title | Description Of The LINK System Used For MUC-5 |
Venue | Message Understanding Conference |
Session | Main Conference |
Year | 1993 |
Authors |
|
are removed, as are author name lines, and COMLINE tag lines . Sentences that are too short to be interesting are removed . The Tagger Because the input is mixed case in this domain, and because many of the proper names tha t would normally be unknown to the system lexicon are capitalized, the MUC-5 LINK syste m uses a pre-parse tagger to process and attempt to identify capitalized words which are passe d as strings from the Tokenizer. The Tagger uses heuristics (aka hacks) to break apart strings i n several different ways. Some of the tags that are used include : :COMP-NAME for things that seem to be obviously company names, :LOCATION for city/state pairs, :PERSON-NAME for people names (if they have Mr, Mrs, VP, Dr in front), and :NAME for other names . Some example rules that the tagger ...