Randomized coordinate descent (RCD) methods are state-of-the-art algorithms for training linear predictors via minimizing regularized empirical risk. When the number of examples (n) is much larger than the number of features (d), a common strategy is to apply RCD to the dual problem. On the other hand, when the number of features is much larger than the number of examples, it makes sense to apply RCD directly to the primal problem. In this paper we provide the first joint study of these two approaches when applied to L2-regularized linear ERM. First, we show through a rigorous analysis that for dense data, the above intuition is precisely correct. However, we find that for sparse and structured data, primal RCD can significantly outperform dual RCD even if $d n$, and vice versa, dual RCD can be much faster than primal RCD even if $n d$. Moreover, we show that, surprisingly, a single sampling strategy minimizes both the (bound on the) number of iterations and the overall expected complexity of RCD. Note that the latter complexity measure also takes into account the average cost of the iterations, which depends on the structure and sparsity of the data, and on the sampling strategy employed. We confirm our theoretical predictions using extensive experiments with both synthetic and real data sets.
展开▼
机译:随机坐标血统(RCD)方法是用于通过最小化正则化经验风险训练线性预测器的最新算法。当示例的数量(n)大于特征数量(d)时,常见的策略是将RCD应用于双重问题。另一方面,当特征的数量大于示例的数量时,将RCD直接应用于原始问题是有意义的。在本文中,我们在施加到L2-正规化的线性ERM时,提供了这两种方法的第一个联合研究。首先,我们通过严格的分析表明,对于密集数据,上述直觉正是正确的。但是,我们发现,对于稀疏和结构化数据,即使$ n $,反之亦然,原始RCD也可以显着优于双RCD,即使$ n D $,双RCD也可以比原始RCD快得多。此外,我们表明,令人惊讶的是,单个采样策略最小化(界限)迭代次数和RCD的总体预期复杂性。请注意,后一种复杂度措施还考虑了迭代的平均成本,这取决于数据的结构和稀疏性,以及所采用的抽样策略。我们使用具有合成和实际数据集的广泛实验确认我们的理论预测。
展开▼