Paper: Language-independent compound splitting with morphological operations

ACL ID P11-1140
Title Language-independent compound splitting with morphological operations
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011

Translating compounds is an important prob- lem in machine translation. Since many com- pounds have not been observed during train- ing, they pose a challenge for translation sys- tems. Previous decompounding methods have often been restricted to a small set of lan- guages as they cannot deal with more complex compound forming processes. We present a novel and unsupervised method to learn the compoundpartsandmorphologicaloperations needed to split compounds into their com- pound parts. The method uses a bilingual corpus to learn the morphological operations required to split a compound into its parts. Furthermore, monolingual corpora are used to learn and filter the set of compound part can- didates. We evaluate our method within a ma- chinetranslation task andshow significant im- provemen...