Paper: Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty

ACL ID P09-1054
Title Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2009
Authors

Stochastic gradient descent (SGD) uses approximate gradients estimated from subsets of the training data and updates the parameters in an online fashion. This learning framework is attractive because it often requires much less training time in practice than batch training algorithms. However, L1-regularization, which is be- coming popular in natural language pro- cessing because of its ability to pro- duce compact models, cannot be effi- ciently applied in SGD training, due to the large dimensions of feature vectors and the fluctuations of approximate gra- dients. We present a simple method to solve these problems by penalizing the weights according to cumulative values for L1 penalty. We evaluate the effectiveness of our method in three applications: text chunking, named entity recogniti...