Paper: Multilingual Speech Databases At LDC

ACL ID H94-1006
Title Multilingual Speech Databases At LDC
Venue Human Language Technologies
Session Main Conference
Year 1994

As multilingual products and technology grow in importance, the Linguistic Data Consortium (LDC) intends to provide the resources needed for research and development activities, especially in telephone-based, small-vocabulary recognition applications; language identification research; and large vo- cabulary continuous speech recognition research. The POLYPHONE corpora, a multilingual "database of databases," are specifically designed to meet the needs of telephone application development and testing. Data sets from many of the world's commercially important languages will be available within the next few years. Language identification corpora will be large sets of spon- taneous telephone speech in several languages with a wide variety of speakers, channels, and handsets. One corpus is now ...