Paper: Splitting of Compound Terms in non-Prototypical Compounding Languages

ACL ID W14-5702
Title Splitting of Compound Terms in non-Prototypical Compounding Languages
Venue Computational Approaches to Compound Analysis
Session
Year 2014
Authors

Compounding is present in a large variety of languages in different proportions. Compound rate in the text obviously depends on the language, but also on the genre and the domain. Scientific and technical texts are especially conducive to compounding, even in the languages that are not traditionally admitted as highly compounding ones. In this article we address compound splitting of specialized terms. We propose a multi-lingual method of compound recognition and splitting, which uses corpus frequencies, lexical data and optionally linguistic rules. This is a supervised method which requires a small amount of segmented compounds as input. We evaluate the method on two languages that rarely serve as a material for automatic splitting systems: English and Russian. The results obtained are co...