Paper: Modelling of a Gazetteer Look-up Component

ACL ID I05-2028
Title Modelling of a Gazetteer Look-up Component
Venue International Joint Conference on Natural Language Processing
Session poster-demo-tutorial
Year 2005
  • Jakub Piskorski (German Research Center for Artificial Intelligence, Saarbrucken Germany)

Several data structures can be used to implement a gazetteer, e.g. hash tables, tries and finite-state automata. The latter require less memory than the alternative techniques and guarantee efficient access to the data (1). In this paper, we compare two finite-state based data structures for implementing a gazetteer lookup component, one involving numbered automata with multiple initial states combined with an external table (2) against the method focused on converting the input data in such a way as to model the gazetteer solely as a single finite-state automaton without any auxiliary storage device tailored to it. Further, we explore the impact of transition jamming – an equivalence transformation on finite-state devices (3) – on the size of the automata. The paper is organized as fo...