Paper: Improving Data Driven Wordclass Tagging by System Combination

ACL ID C98-1078
Title Improving Data Driven Wordclass Tagging by System Combination
Venue International Conference on Computational Linguistics
Session Main Conference
Year 1998
Authors

In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best indi- vidua| system. We do this by means of an ex- periment involving the task of morpho-syntactic wordclass tagging. Four well-known tagger gen- erators (Hidden Markov Model, Memory-Based, Transformation Rules and Maximum Entropy) are trained on the same corpus data. Af- ter comparison, their outputs are combined us- ing several voting strategies and second stage classifiers. All combination taggers outperform their best component, with the best combina- tion showing a 19.1% lower error rate than the best individual tagger.