Paper: Using Linguistic Principles To Recover Empty Categories

ACL ID P04-1082
Title Using Linguistic Principles To Recover Empty Categories
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2004

This paper describes an algorithm for detecting empty nodes in the Penn Treebank (Marcus et al. , 1993), finding their antecedents, and assigning them function tags, without access to lexical information such as valency. Unlike previous approaches to this task, the current method is not corpus-based, but rather makes use of the principles of early Government-Binding theory (Chomsky, 1981), the syntactic theory that underlies the annotation. Using the evaluation metric proposed by Johnson (2002), this approach outperforms previously published approaches on both detection of empty categories and antecedent identification, given either annotated input stripped of empty categories or the output of a parser. Some problems with this evaluation metric are noted and an alternative is proposed alon...