Stochastic Gradient Descent Training for Ll-regularized Log-linear Models with Cumulative Penalty

机译：具有累积惩罚性的Ll正则对数线性模型的随机梯度下降训练

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Stochastic gradient descent (SGD) uses approximate gradients estimated from subsets of the training data and updates the parameters in an online fashion. This learning framework is attractive because it often requires much less training time in practice than batch training algorithms. However, Ll-regularization, which is becoming popular in natural language processing because of its ability to produce compact models, cannot be efficiently applied in SGD training, due to the large dimensions of feature vectors and the fluctuations of approximate gradients. We present a simple method to solve these problems by penalizing the weights according to cumulative values for L1 penalty. We evaluate the effectiveness of our method in three applications: text chunking, named entity recognition, and part-of-speech tagging. Experimental results demonstrate that our method can produce compact and accurate models much more quickly than a state-of-the-art quasi-Newton method for Ll-regularized log-linear models.

机译：随机梯度下降（SGD）使用从训练数据子集估算的近似梯度，并以在线方式更新参数。这种学习框架很有吸引力，因为与批处理训练算法相比，它在实践中通常需要更少的训练时间。然而，由于特征向量的大尺寸和近似梯度的波动，由于其产生紧凑模型的能力而在自然语言处理中变得流行的L-1正则化不能有效地应用于SGD训练中。我们提出一种简单的方法来解决这些问题，方法是根据L1罚则的累积值对权重进行惩罚。我们在以下三个应用程序中评估了该方法的有效性：文本分块，命名实体识别和词性标记。实验结果表明，相对于Ll正则化对数线性模型而言，我们的方法比最新的拟牛顿方法能更快地生成紧凑而准确的模型。

著录项

来源
《Joint conference of the annual meeting of the Association for Computational Linguistics;International joint conference on natural language processing of the Asian Federation of Natural Languages Processing;ACL 2009;IJCNLP 2009》|2009年|P.477-485|共9页
会议地点
作者
Yoshimasa Tsuruoka; Junichi Tsujii; Sophia Ananiadou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序语言、算法语言;
关键词

相似文献

外文文献
中文文献
专利

1. Parameter inference for discretely observed stochastic kinetic models using stochastic gradient descent [J] . Yuanfeng Wang, Scott Christley, Eric Mjolsness, BMC Systems Biology . 2010,第1期

机译：使用随机梯度下降法的离散观测随机动力学模型的参数推论
2. Non-convergence of stochastic gradient descent in the training of deep neural networks [J] . Cheridito Patrick, Jentzen Arnulf, Rossmannek Florian Journal of complexity . 2021,第Juna期

机译：深神经网络训练中随机梯度下降的非融合
3. Gradient Descent Using Stochastic Circuits for Efficient Training of Learning Machines [J] . Siting Liu, Honglan Jiang, Leibo Liu, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2018,第11期

机译：使用随机电路进行梯度下降以有效学习机器
4. Stochastic Gradient Descent Training for Ll-regularized Log-linear Models with Cumulative Penalty [C] . Joint conference of the annual meeting of the Association for Computational Linguistics . 2009

机译：具有累积惩罚的LL正则逻辑线性模型的随机梯度血液训练
5. An Investigation of Stochastic Gradient Descent Dynamics of Neural Networks [D] . Luo, Victor. 2021

机译：神经网络随机梯度下降动力学研究
6. Parameter inference for discretely observed stochastic kinetic models using stochastic gradient descent [O] . Yuanfeng Wang, Scott Christley, Eric Mjolsness, 2010

机译：使用随机梯度下降法的离散观测随机动力学模型的参数推断
7. Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty [O] . Yoshimasa Tsuruoka, Sophia Ananiadou 2010

机译：具有累积惩罚的L1正则对数线性模型的随机梯度下降训练

Stochastic Gradient Descent Training for Ll-regularized Log-linear Models with Cumulative Penalty

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅