Paper: An MDL-based approach to extracting subword units for grapheme-to-phoneme conversion

ACL ID N10-1107
Title An MDL-based approach to extracting subword units for grapheme-to-phoneme conversion
Venue Human Language Technologies
Session Main Conference
Year 2010
Authors

We address a key problem in grapheme-to- phoneme conversion: the ambiguity in map- ping grapheme units to phonemes. Rather than using single letters and phonemes as units, we propose learning chunks, or subwords, to re- duce ambiguity. This can be interpreted as learning a lexicon of subwords that has min- imum description length. We implement an algorithm to build such a lexicon, as well as a simple decoder that uses these subwords.