Paper: Quantitative Methods For Classifying Writing Systems

ACL ID N06-2030
Venue Human Language Technologies
Session Short Paper
Year 2006
  • Gerald Penn (University of Toronto, Toronto ON)
  • Travis Choma (Cognitive Science Center Amsterdam, Amsteradam The Netherlands)

We describe work in progress on using quantitative methods to classify writing systems according to Sproat’s (2000) clas- sification grid using unannotated data. We specifically propose two quantitative tests for determining the type of phonography in a writing system, and its degree of lo- gography, respectively. 1 Background If you understood all of the world’s languages, you would still not be able to read many of the texts that you find on the world wide web, because they are written in non-Roman scripts that have been ar- bitrarily encoded for electronic transmission in the absence of an accepted standard. This very mod- ern nuisance reflects a dilemma as ancient as writ- ing itself: the association between a language as it is spoken and the language as it is written has a sort of...