Paper: An Empirical Study Of Active Learning With Support Vector Machines ForJapanese Word Segmentation

ACL ID P02-1064
Title An Empirical Study Of Active Learning With Support Vector Machines ForJapanese Word Segmentation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2002
Authors

We explore how active learning with Sup- port Vector Machines works well for a non-trivial task in natural language pro- cessing. We use Japanese word segmenta- tion as a test case. In particular, we discuss how the size of a pool affects the learning curve. It is found that in the early stage of training with a larger pool, more la- beled examples are required to achieve a given level of accuracy than those with a smaller pool. In addition, we propose a novel technique to use a large number of unlabeled examples effectively by adding them gradually to a pool. The experimen- tal results show that our technique requires less labeled examples than those with the technique in previous research. To achieve 97.0 % accuracy, the proposed technique needs 59.3 % of labeled examples that are requir...