Title Morphological Tagging: Data Vs. Dictionaries
Year 2000
  • Jan Hajič (Johns Hopkins University, Baltimore MD)

Part of Speech tagging for English seems to have reached the the human levels of error, but full mor- phological tagging for inflectionally rich languages, such as Romanian, Czech, or Hungarian, is still an open problem, and the results are far from being satisfactory. This paper presents results obtained by using a universalized exponential feature-based model for five such languages. It focuses on the data sparseness issue, which is especially severe for such languages (the more so that there are no extensive annotated data for those languages). In conclusion, we argue strongly that the use of an independent morphological dictionary is the preferred choice to more annotated data under such circumstances. 1 Full Morphological Tagging English Part of Speech (POS) tagging has been widely de...