Paper: Probabilistic And Rule-Based Tagger Of An Inflective Language - A Comparison

ACL ID A97-1017
Title Probabilistic And Rule-Based Tagger Of An Inflective Language - A Comparison
Venue Applied Natural Language Processing Conference
Session Main Conference
Year 1997
Authors

We present results of probabilistic tag- ging of Czech texts in order to show how these techniques work for one of the highly morphologically ambiguous inflective lan- guages. After description of the tag system used, we show the results of four experi- ments using a simple probabilistic model to tag Czech texts (unigram, two bigram experiments, and a trigram one). For com- parison, we have applied the same code and settings to tag an English text (another four experiments) using the same size of training and test data in the experiments in order to avoid any doubt concerning the va- lidity of the comparison. The experiments use the source channel model and maxi- mum likelihood training on a Czech hand- tagged corpus and on tagged Wall Street Journal (WSJ) from the LDC collection. The expe...