Paper: Building A Lexical Domain Map From Text Corpora

ACL ID C94-1100
Title Building A Lexical Domain Map From Text Corpora
Venue International Conference on Computational Linguistics
Session Main Conference
Year 1994

For all kinds of terms that can be assigned 1o the representation of a docmnent, e.g., words, operatorm'gument pairs, fixed phrases, ~md proper n,'unes, vltrious levels of "reguh'u'ization",are needed to,assure that syntactic or lexie,'d v,'u'iations of input do not obscure underlying semantic uniformity. Without actually doing semantic analysis, tiffs kind of normalization can be achieved through the following processes: ~ (1) morpbological stemming: e.g., retrieving is reduced to retriev; An altematlve, but less efficient method is to generate all variants (lexical, syntactic, etc). of words/phrases in the queries (SparckJones & "Fail, 1984). (2) lexicon-based word nonnldizntion: e.g., retrieval is reduced to retrieve; (3) operator-argument representation of phr'tses: e.g., information r...