Paper: Development of a Persian Syntactic Dependency Treebank

ACL ID N13-1031
Title Development of a Persian Syntactic Dependency Treebank
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2013

This paper describes the annotation process and linguistic properties of the Persian syn- tactic dependency treebank. The treebank consists of approximately 30,000 sentences annotated with syntactic roles in addition to morpho-syntactic features. One of the unique features of this treebank is that there are al- most 4800 distinct verb lemmas in its sen- tences making it a valuable resource for ed- ucational goals. The treebank is constructed with a bootstrapping approach by means of available tagging and parsing tools and man- ually correcting the annotations. The data is splitted into standard train, development and test set in the CoNLL dependency format and is freely available to researchers.