Paper: SystemT: An Algebraic Approach to Declarative Information Extraction

ACL ID P10-1014
Title SystemT: An Algebraic Approach to Declarative Information Extraction
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2010
Authors

As information extraction (IE) becomes more central to enterprise applications, rule-based IE engines have become in- creasingly important. In this paper, we describe SystemT, a rule-based IE sys- tem whose basic design removes the ex- pressivity and performance limitations of current systems based on cascading gram- mars. SystemT uses a declarative rule language, AQL, and an optimizer that generates high-performance algebraic ex- ecution plans for AQL rules. We com- pare SystemT’s approach against cascad- ing grammars, both theoretically and with a thorough experimental evaluation. Our results show that SystemT can deliver re- sult quality comparable to the state-of-the- art and an order of magnitude higher an- notation throughput.