Paper: Enhanced Word Decomposition by Calibrating the Decision Threshold of Probabilistic Models and Using a Model Ensemble

ACL ID P10-1039
Title Enhanced Word Decomposition by Calibrating the Decision Threshold of Probabilistic Models and Using a Model Ensemble
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2010
Authors

This paper demonstrates that the use of ensemble methods and carefully calibrat- ing the decision threshold can signifi- cantly improve the performance of ma- chine learning methods for morphologi- cal word decomposition. We employ two algorithms which come from a family of generative probabilistic models. The mod- els consider segment boundaries as hidden variables and include probabilities for let- ter transitions within segments. The ad- vantage of this model family is that it can learn from small datasets and easily gen- eralises to larger datasets. The first algo- rithm PROMODES, which participated in the Morpho Challenge 2009 (an interna- tional competition for unsupervised mor- phological analysis) employs a lower or- der model whereas the second algorithm PROMODES-H is a novel deve...