Paper: Automatically Identifying Computationally Relevant Typological Features

ACL ID I08-2093
Title Automatically Identifying Computationally Relevant Typological Features
Venue International Joint Conference on Natural Language Processing
Session Main Conference
Year 2008
Authors

In this paper we explore the potential for iden- tifying computationally relevant typological fea- tures from a multilingual corpus of language data built from readily available language data col- lected off the Web. Our work builds on previous structural projection work, where we extend the work of projection to building individual CFGs for approximately 100 languages. We then use the CFGs to discover the values of typological parameters such as word order, the presence or absence of definite and indefinite determiners, etc. Our methods have the potential of being extended to many more languages and parame- ters, and can have significant effects on current research focused on tool and resource develop- ment for low-density languages and grammar in- duction from raw corpora.