Paper: A Finite State And Data-Oriented Method For Grapheme To Phoneme Conversion

ACL ID A00-2040
Title A Finite State And Data-Oriented Method For Grapheme To Phoneme Conversion
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2000
Authors
  • Gosse Bouma (University of Groningen, Groningen The Netherlands)

A finite-state method, based on leftmost longest- match replacement, is presented for segmenting words into graphemes, and for converting graphemes into phonemes. A small set of hand-crafted conver- sion rules for Dutch achieves a phoneme accuracy of over 93%. The accuracy of the system is further im- proved by using transformation-based learning. The phoneme accuracy of the best system (using a large rule and a 'lazy' variant of Brill's algoritm), trained on only 40K words, reaches 99%.