Paper: Unlimited Vocabulary Speech Recognition For Agglutinative Languages

ACL ID N06-1062
Title Unlimited Vocabulary Speech Recognition For Agglutinative Languages
Venue Human Language Technologies
Session Main Conference
Year 2006
Authors

It is practically impossible to build a word-based lexicon for speech recogni- tion in agglutinative languages that would cover all the relevant words. The prob- lem is that words are generally built by concatenating several prefixes and suffixes to the word roots. Together with com- pounding and inflections this leads to mil- lions of different, but still frequent word forms. Due to inflections, ambiguity and other phenomena, it is also not trivial to automatically split the words into mean- ingful parts. Rule-based morphological analyzers can perform this splitting, but due to the handcrafted rules, they also suf- fer from an out-of-vocabulary problem. In this paper we apply a recently proposed fully automatic and rather language and vocabulary independent way to build sub- word lexica f...