Paper: Enhancing Unlexicalized Parsing Performance Using a Wide Coverage Lexicon Fuzzy Tag-Set Mapping and EM-HMM-Based Lexical Probabilities

ACL ID E09-1038
Title Enhancing Unlexicalized Parsing Performance Using a Wide Coverage Lexicon Fuzzy Tag-Set Mapping and EM-HMM-Based Lexical Probabilities
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 2009
Authors

We present a framework for interfacing a PCFG parser with lexical information from an external resource following a dif- ferent tagging scheme than the treebank. This is achieved by defining a stochas- tic mapping layer between the two re- sources. Lexical probabilities for rare events are estimated in a semi-supervised manner from a lexicon and large unanno- tated corpora. We show that this solu- tion greatly enhances the performance of an unlexicalized Hebrew PCFG parser, re- sulting in state-of-the-art Hebrew parsing results both when a segmentation oracle is assumed, and in a real-word parsing sce- nario of parsing unsegmented tokens.