Paper: N-Gram Cluster Identification During Empirical Knowledge Representation Generation

ACL ID C94-2171
Title N-Gram Cluster Identification During Empirical Knowledge Representation Generation
Venue International Conference on Computational Linguistics
Session Main Conference
Year 1994
Authors

In particular, the use of empirical techniques during the identification and generation of a semantic representation is considered. A key step is the discovery of useful n-grams and correlations between clusters of these n-grams. keywords: knowledge representation, large text corpora, language understanding. 1. BACKGROUND The primary knowledge extraction and text retrieval conferences (MUC-4, 1992; TREC-1, 1993; TIPSTER, forthcoming) utilise domain-specific queries and templates to identify relevant concepts from within a corpus and extract applicable documents or information. The structures generated by the system discussed in this paper are similar to these domain-specific templates, they could be used for compact representation of information contained in documents for text retrieval pu...