Paper: Large-Scale Induction And Evaluation Of Lexical Resources From The Penn-II Treebank

ACL ID P04-1047
Title Large-Scale Induction And Evaluation Of Lexical Resources From The Penn-II Treebank
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2004
Authors

In this paper we present a methodology for ex- tracting subcategorisation frames based on an automatic LFG f-structure annotation algorithm for the Penn-II Treebank. We extract abstract syntactic function-based subcategorisation frames (LFG semantic forms), traditional CFG category- based subcategorisation frames as well as mixed function/category-based frames, with or without preposition information for obliques and particle in- formation for particle verbs. Our approach does not predefine frames, associates probabilities with frames conditional on the lemma, distinguishes be- tween active and passive frames, and fully reflects the effects of long-distance dependencies in the source data structures. We extract 3586 verb lem- mas, 14348 semantic form types (an average of 4 per lemma) with ...