首页> 外文期刊>Machine Learning >Block coordinate descent algorithms for large-scale sparse multiclass classification
【24h】

Block coordinate descent algorithms for large-scale sparse multiclass classification

机译:大规模稀疏多类分类的块坐标下降算法

获取原文
获取原文并翻译 | 示例

摘要

Over the past decade, ℓ 1 regularization has emerged as a powerful way to learn classifiers with implicit feature selection. More recently, mixed-norm (e.g., ℓ 1/ℓ 2) regularization has been utilized as a way to select entire groups of features. In this paper, we propose a novel direct multiclass formulation specifically designed for large-scale and high-dimensional problems such as document classification. Based on a multiclass extension of the squared hinge loss, our formulation employs ℓ 1/ℓ 2 regularization so as to force weights corresponding to the same features to be zero across all classes, resulting in compact and fast-to-evaluate multiclass models. For optimization, we employ two globally-convergent variants of block coordinate descent, one with line search (Tseng and Yun in Math. Program. 117:387–423, 2009) and the other without (Richtárik and Takáč in Math. Program. 1–38, 2012a; Tech. Rep. arXiv:​1212.​0873, 2012b). We present the two variants in a unified manner and develop the core components needed to efficiently solve our formulation. The end result is a couple of block coordinate descent algorithms specifically tailored to our multiclass formulation. Experimentally, we show that block coordinate descent performs favorably compared to other solvers such as FOBOS, FISTA and SpaRSA. Furthermore, we show that our formulation obtains very compact multiclass models and outperforms ℓ 1/ℓ 2-regularized multiclass logistic regression in terms of training speed, while achieving comparable test accuracy.
机译:在过去的十年中,ℓ1正则化已成为学习具有隐含特征选择的分类器的有力方法。最近,混合范数(例如,ℓ1 /ℓ2)正则化已被用作选择整个特征组的方法。在本文中,我们提出了一种新颖的直接多类表述,专门针对大规模和高维问题(例如文档分类)而设计。基于平方铰链损耗的多类扩展,我们的公式采用ℓ1 /ℓ2正则化,以便在所有类中强制将对应于相同特征的权重设为零,从而生成紧凑且易于评估的多类模型。为了进行优化,我们采用了块坐标下降的两个全局收敛变体,一个带有行搜索(Tseng和Yun在Math。Program。117:387–423,2009),另一个没有(在Math。Program。1中的Richtárik和Takáč)。 – 38,2012a;技术代表arXiv:1212.0873,2012b)。我们以统一的方式介绍这两种变体,并开发有效解决我们的配方所需的核心组件。最终结果是专门为我们的多类公式量身定制的几个块坐标下降算法。实验表明,与其他求解器(如FOBOS,FISTA和SpaRSA)相比,块坐标下降的性能更好。此外,我们表明,我们的公式获得了非常紧凑的多类模型,并且在训练速度方面优于ℓ1 /ℓ2-正则化多类logistic回归,同时实现了相当的测试准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号