Paper: Mistake-Driven Learning In Text Categorization

ACL ID W97-0306
Title Mistake-Driven Learning In Text Categorization
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 1997

Learning problems in the text processing domain often map the text to a space whose dimensions are the measured fea- tures of the text, e.g., its words. Three characteristic properties of this domain are (a) very high dimensionality, (b) both the learned concepts and the instances reside very sparsely in the feature space, and (c) a high variation in the number of active features in an instance. In this work we study three mistake-driven learning algo- rithms for a typical task of this nature - text categorization. We argue that these algorithms- which categorize documents bY learning a linear separator in the feature space - have a few properties that make them ideal for this do- main. We then show that a quantum leap in performance is achieved when we fur- ther modify the algorithms to b...