ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | P98-1071 |
---|---|
Title | Automatic Extraction of Subcorpora based on Subcategorization Frames from a Part-ofSpeech Tagged Corpus |
Venue | Annual Meeting of the Association of Computational Linguistics |
Session | Main Conference |
Year | 1998 |
Authors |
|
This paper presents a method for extracting subcorpora documenting different subcate- gorization frames for verbs, nouns, and adjectives in the 100 mio. word British National Corpus. The extraction tool consists of a set of batch files for use with the Corpus Query Processor (CQP), which is part of the IMS corpus workbench (cf. Christ 1994a,b). A macroprocessor has been developed that allows the user to specify in a simple input file which subcorpora are to be created for a given lemma. The resulting subcorpora can be used (1) to provide evidence for the subcategorization properties of a given lemma, and to facilitate the selection of corpus lines for lexicographic research, and (2) to determine the frequencies of different syntactic contexts of each lemma.