Paper: WebCAGe – A Web-Harvested Corpus Annotated with GermaNet Senses

ACL ID E12-1039
Title WebCAGe – A Web-Harvested Corpus Annotated with GermaNet Senses
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 2012
Authors

This paper describes an automatic method for creating a domain-independent sense- annotated corpus harvested from the web. As a proof of concept, this method has been applied to German, a language for which sense-annotated corpora are still in short supply. The sense inventory is taken from the German wordnet GermaNet. The web-harvesting relies on an existing map- ping of GermaNet to the German version of the web-based dictionary Wiktionary. The data obtained by this method consti- tute WebCAGe (short for: Web-Harvested Corpus Annotated with GermaNet Senses), a resource which currently represents the largest sense-annotated corpus available for German. While the present paper focuses on one particular language, the method as such is language-independent. 1 Motivation The availability of la...