【24h】

Stochastic Gradient Descent Training for Ll-regularized Log-linear Models with Cumulative Penalty

机译:具有累积惩罚性的Ll正则对数线性模型的随机梯度下降训练

获取原文

摘要

Stochastic gradient descent (SGD) uses approximate gradients estimated from subsets of the training data and updates the parameters in an online fashion. This learning framework is attractive because it often requires much less training time in practice than batch training algorithms. However, Ll-regularization, which is becoming popular in natural language processing because of its ability to produce compact models, cannot be efficiently applied in SGD training, due to the large dimensions of feature vectors and the fluctuations of approximate gradients. We present a simple method to solve these problems by penalizing the weights according to cumulative values for L1 penalty. We evaluate the effectiveness of our method in three applications: text chunking, named entity recognition, and part-of-speech tagging. Experimental results demonstrate that our method can produce compact and accurate models much more quickly than a state-of-the-art quasi-Newton method for Ll-regularized log-linear models.
机译:随机梯度下降(SGD)使用从训练数据子集估算的近似梯度,并以在线方式更新参数。这种学习框架很有吸引力,因为与批处理训练算法相比,它在实践中通常需要更少的训练时间。然而,由于特征向量的大尺寸和近似梯度的波动,由于其产生紧凑模型的能力而在自然语言处理中变得流行的L-1正则化不能有效地应用于SGD训练中。我们提出一种简单的方法来解决这些问题,方法是根据L1罚则的累积值对权重进行惩罚。我们在以下三个应用程序中评估了该方法的有效性:文本分块,命名实体识别和词性标记。实验结果表明,相对于Ll正则化对数线性模型而言,我们的方法比最新的拟牛顿方法能更快地生成紧凑而准确的模型。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号