Coordinate Descent Faceoff: Primal or Dual?

Dominik Csiba; Peter Richtárik

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Coordinate Descent Faceoff: Primal or Dual?

【24h】

Coordinate Descent Faceoff: Primal or Dual?

机译：坐标下降面部：原始或双重？

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Randomized coordinate descent (RCD) methods are state-of-the-art algorithms for training linear predictors via minimizing regularized empirical risk. When the number of examples (n) is much larger than the number of features (d), a common strategy is to apply RCD to the dual problem. On the other hand, when the number of features is much larger than the number of examples, it makes sense to apply RCD directly to the primal problem. In this paper we provide the first joint study of these two approaches when applied to L2-regularized linear ERM. First, we show through a rigorous analysis that for dense data, the above intuition is precisely correct. However, we find that for sparse and structured data, primal RCD can significantly outperform dual RCD even if $d n$, and vice versa, dual RCD can be much faster than primal RCD even if $n d$. Moreover, we show that, surprisingly, a single sampling strategy minimizes both the (bound on the) number of iterations and the overall expected complexity of RCD. Note that the latter complexity measure also takes into account the average cost of the iterations, which depends on the structure and sparsity of the data, and on the sampling strategy employed. We confirm our theoretical predictions using extensive experiments with both synthetic and real data sets.

机译：随机坐标血统（RCD）方法是用于通过最小化正则化经验风险训练线性预测器的最新算法。当示例的数量（n）大于特征数量（d）时，常见的策略是将RCD应用于双重问题。另一方面，当特征的数量大于示例的数量时，将RCD直接应用于原始问题是有意义的。在本文中，我们在施加到L2-正规化的线性ERM时，提供了这两种方法的第一个联合研究。首先，我们通过严格的分析表明，对于密集数据，上述直觉正是正确的。但是，我们发现，对于稀疏和结构化数据，即使$ n $，反之亦然，原始RCD也可以显着优于双RCD，即使$ n D $，双RCD也可以比原始RCD快得多。此外，我们表明，令人惊讶的是，单个采样策略最小化（界限）迭代次数和RCD的总体预期复杂性。请注意，后一种复杂度措施还考虑了迭代的平均成本，这取决于数据的结构和稀疏性，以及所采用的抽样策略。我们使用具有合成和实际数据集的广泛实验确认我们的理论预测。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第2010期|共22页
作者
Dominik Csiba; Peter Richtárik;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A COORDINATE-DESCENT PRIMAL-DUAL ALGORITHM WITH LARGE STEP SIZE AND POSSIBLY NONSEPARABLE FUNCTIONS [J] . Fercoq Olivier, Bianchi Pascal SIAM Journal on Optimization: A Publication of the Society for Industrial and Applied Mathematics . 2019,第1期

机译：具有大步尺寸和可能不可分离的功能的坐标性血压原始算法
2. AVOIDING COMMUNICATION IN PRIMAL AND DUAL BLOCK COORDINATE DESCENT METHODS [J] . Devarakonda Aditya, Fountoulakis Kimon, Demmel James, SIAM Journal on Scientific Computing . 2019,第1期

机译：避免在原始和双块坐标血管下降方法中的通信
3. A COORDINATE-DESCENT PRIMAL-DUAL ALGORITHM WITH LARGE STEP SIZE AND POSSIBLY NONSEPARABLE FUNCTIONS [J] . Fercoq Olivier, Bianchi Pascal SIAM Journal on Optimization: A Publication of the Society for Industrial and Applied Mathematics . 2019,第1期

机译：具有大步尺寸和可能不可分离的功能的坐标性血压原始算法
4. Doubly Greedy Primal-Dual Coordinate Descent for Sparse Empirical Risk Minimization [C] . Qi Lei, Ian E. H. Yen, Chao-yuan Wu, International Conference on Machine Learning . 2018

机译：双重贪婪的原始 - 双坐标血管下降，用于稀疏经验风险最小化
5. Using Regularization to Evaluate Differential Item Functioning among Multiple Covariates: A Penalized Expectation-Maximization Algorithm via Coordinate Descent and Soft-Thresholding [D] . Belzak, William C.M. 2021

机译：使用正常化来评估多个协变量之间的差异项目：通过坐标血统和软阈值处理的惩罚期望 - 最大化算法
6. A primal role for the vestibular sense in the development of coordinated locomotion [O] . David E Ehrlich, David Schoppik -1

机译：前庭感觉在协调运动发展中的主要作用
7. Avoiding Communication in Primal and Dual Block Coordinate Descent Methods [O] . Aditya Devarakonda, Kimon Fountoulakis, James Demmel, 2019

机译：避免在原始和双块坐标血管下降方法中的通信

Coordinate Descent Faceoff: Primal or Dual?

摘要

著录项

相似文献

相关主题

期刊订阅