Paper: Data Warehouse, Bronze, Gold, STEC, Software

ACL ID W14-2213
Title Data Warehouse, Bronze, Gold, STEC, Software
Venue Workshop on the Use of Computational Methods in the Study of Endangered Languages
Year 2014

We are building an analytical data warehouse for linguistic data ? primarily lexicons and phonological data ? for languages in the Asia-Pacific region. This paper briefly out- lines the project, making the point that the need for improved technology for endangered and low-density language data extends well beyond completion of fieldwork. We suggest that shared task evaluation challenges (STECs) are an appropriate model to follow for creating this technology, and that stocking data warehouses with clean bronze-standard data and baseline tools ? no mean task ? is an effective way to elicit the broad collaboration from linguists and computer scientists needed to create the gold-standard data that STECs require.