Paper: Training A Naive Bayes Classifier Via The EM Algorithm With A Class Distribution Constraint

ACL ID W03-0417
Title Training A Naive Bayes Classifier Via The EM Algorithm With A Class Distribution Constraint
Venue International Conference on Computational Natural Language Learning
Session Main Conference
Year 2003
Authors
  • Yoshimasa Tsuruoka (CREST Japan Science and Technology Corporation, Saitama Japan; University of Tokyo, Tokyo Japan)
  • Jun'ichi Tsujii (University of Tokyo, Tokyo Japan; CREST Japan Science and Technology Corporation, Saitama Japan)

Combining a naive Bayes classifier with the EM algorithm is one of the promising ap- proaches for making use of unlabeled data for disambiguation tasks when using local con- text features including word sense disambigua- tion and spelling correction. However, the use of unlabeled data via the basic EM algorithm often causes disastrous performance degrada- tion instead of improving classification perfor- mance, resulting in poor classification perfor- mance on average. In this study, we introduce a class distribution constraint into the iteration process of the EM algorithm. This constraint keeps the class distribution of unlabeled data consistent with the class distribution estimated from labeled data, preventing the EM algorithm from converging into an undesirable state. Ex- perimental re...