Paper: Tagset Reduction Without Information Loss

ACL ID P95-1039
Title Tagset Reduction Without Information Loss
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1995
Authors

A technique for reducing a tagset used for n-gram part-of-speech disambiguation is introduced and evaluated in an experi- ment. The technique ensures that all in- formation that is provided by the original tagset can be restored from the reduced one. This is crucial, since we are intere- sted in the linguistically motivated tags for part-of-speech disambiguation. The redu- ced tagset needs fewer parameters for its statistical model and allows more accurate parameter estimation. Additionally, there is a slight but not significant improvement of tagging accuracy. 1 Motivation Statistical part-of-speech disambiguation can be ef- ficiently done with n-gram models (Church, 1988; Cutting et al. , 1992). These models are equivalent to Hidden Markov Models (HMMs) (Rabiner, 1989) of order n - 1. Th...