Paper: Combining Stochastic and Rule-Based Methods for Disambiguation in Agglutinative Languages

ACL ID P98-1063
Title Combining Stochastic and Rule-Based Methods for Disambiguation in Agglutinative Languages
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1998
Authors

In this paper we present the results of the combination of stochastic and rule-based disambiguation methods applied to Basque languagel. The methods we have used in disambiguation are Constraint Grammar formalism and an HMM based tagger developed within the MULTEXT project. As Basque is an agglutinative language, a morphological analyser is needed to attach all possible readings to each word. Then, CG rules are applied using all the morphological features and this process decreases morphological ambiguity of texts. Finally, we use the MULTEXT project tools to select just one from the possible remaining tags. Using only the stochastic method the error rate is about 14%, but the accuracy may be increased by about 2% enriching the lexi- con with the unknown words. When both methods are combin...