Paper: Source Language Markers in EUROPARL Translations

ACL ID C08-1118
Title Source Language Markers in EUROPARL Translations
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2008
Authors

This paper shows that it is very often possible to identify the source language of medium-length speeches in the EU- ROPARL corpus on the basis of fre- quency counts of word n-grams (87.2%- 96.7% accuracy depending on classifica- tion method). The paper also examines in detail which positive markers are most powerful and identifies a number of lin- guistic aspects as well as culture- and domain-related ones.1