Paper: Extended Models And Tools For High-Performance Part-Of-Speech

ACL ID C00-1004
Title Extended Models And Tools For High-Performance Part-Of-Speech
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2000

Statistical part-of-st)eeeh(POS) taggers achieve high accuracy and robustness when based oil large, scale maimally tagged eorl)ora. Ilowever, enhancements of the learning models are necessary to achieve bet- ter 1)erforma.nce. We are develol)ing a learning tool for a Jalmnese morphological analyzer called Ch, aScn. Currently we use a fine-grained POS tag set with about 500 tags. To al)l)ly a normal tri- gram model on the tag set, we need unrealistic size of eorl)ora. Even, for a hi-gram model, we ean- no~, 1)ret)are a llloderate size of an mmotated cor- pus, when we take all the tags as distinct. A usual technique to Col)e with such fine-grained tags is to reduce the size of the tag set 1)y grouping the set of tags into equivalence classes. We introduce the concept of position-wise 9roupin...