Algorithmic stability and ensemble-based learning.

机译：算法稳定性和基于合奏的学习。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We explore two themes in formal learning theory. We begin with a detailed, general study of the relationship between the generalization error and stability of learning algorithms. We then examine ensemble-based learning from the points of view of stability, decorrelation, and threshold complexity.; A central problem of learning theory is bounding generalization error. Most such bounds have been obtained through uniform convergence, via some variant of VC dimension. These analyses focus on the space of classifiers available to a learning algorithm, rather than on the algorithm itself.; Bousquet and Elisseeff (2002) have shown, using McDiarmid's method of independent bounded differences (1989), that algorithmic stability implies good bounds on generalization error. However, their definition of stability is too restrictive to be widely applicable.; We introduce the more general notion of training stability. We show that training stability implies good bounds on generalization error even when the learner has infinite VC dimension. Our proof requires a result of independent interest: we generalize McDiarmid's theorem to the case when differences are bounded with high probability.; In the Probably-Approximately-Correct setting of Valiant (1984), training stability is necessary and sufficient for good error performance and serves as a distribution-dependent analog of VC dimension. This enables us to emphasize the learning algorithm instead of the classifier space.; We next consider algorithms (e.g., boosting) which construct ensembles of weak classifiers. We show that AdaBoost (Freund and Schapire, 1997), a widely-used boosting algorithm, is stability-preserving.; We demonstrate that some boosting algorithms implicitly construct classifiers which are decorrelated (i.e., rarely jointly wrong). We present two new algorithms which build ensembles of explicitly decorrelated classifier pairs. Experiments indicate that these new algorithms are sometimes competitive with AdaBoost and with bagging (Breiman, 1996). The experiments employ a fast new algorithm for finding decorrelated pairs of decision stumps.; Lastly, we use threshold complexity to show that, under certain conditions, AdaBoost takes a long time to converge. In some situations, our pair-based algorithms are exponentially faster than AdaBoost. We analyze boosting as a dynamical system over the space of sequences of weights.

机译：我们探索形式学习理论中的两个主题。我们首先对泛化误差与学习算法的稳定性之间的关系进行详细的常规研究。然后，我们从稳定性，去相关性和阈值复杂度的角度检查基于集成的学习。学习理论的中心问题是泛化误差。大多数此类界限是通过VC尺寸的某些变体通过均匀收敛而获得的。这些分析着重于学习算法可用的分类器空间，而不是算法本身。 Bousquet和Elisseeff（2002）已证明，使用McDiarmid的独立有界差分方法（1989），算法稳定性暗示了泛化误差的良好界限。但是，它们对稳定性的定义过于严格，无法广泛应用。我们介绍训练稳定性的更一般的概念。我们表明，即使学习者具有无限的VC维数，训练稳定性也暗示了泛化误差的良好界限。我们的证明需要有独立利益的结果：我们将麦克迪米德定理推广到差异以高概率为界的情况。在Valiant（1984）的“近似正确”设置中，训练稳定性对于获得良好的错误性能是必要且足够的，并且是VC尺寸的依赖于分布的类似物。这使我们能够强调学习算法而不是分类器空间。接下来，我们考虑构造弱分类器的合奏的算法（例如，增强）。我们证明了AdaBoost（Freund和Schapire，1997），一种广泛使用的增强算法，是保持稳定性的。我们证明了某些提升算法会隐式构造与去相关的分类器（即很少共同出错）。我们提出了两种新算法，它们构建了显式去相关的分类器对的集合。实验表明，这些新算法有时与AdaBoost和袋装竞争（Breiman，1996）。实验采用了一种快速的新算法来寻找解相关的决策树对。最后，我们使用阈值复杂度来表明，在某些条件下，AdaBoost会花费很长时间收敛。在某些情况下，我们基于对的算法比AdaBoost快得多。我们分析了在权重序列空间上作为动力系统的提升。

著录项

作者
Kutin, Samuel Alan.;
展开▼
作者单位

The University of Chicago.;

展开▼
授予单位 The University of Chicago.;
学科 Computer Science.
学位 Ph.D.
年度 2002
页码 275 p.
总页数 275
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. LOCAL INSTABILITY PROBLEM OF IMAGE SEGMENTATION ALGORITHMS: SYSTEMATIC STUDY AND AN ENSEMBLE-BASED SOLUTION [J] . LUCAS FRANEK, XIAOYI JIANG, PAKAKET WATTUYA International Journal of Pattern Recognition and Artificial Intelligence . 2012,第5期

机译：图像分割算法的局部不稳定性问题：系统研究和基于枚举的解决方案
2. BBK* (Branch and Bound Over K*): A Provable and Efficient Ensemble-Based Protein Design Algorithm to Optimize Stability and Binding Affinity Over Large Sequence Spaces [J] . Adegoke A. Ojewole, Jonathan D. Jou, Vance G. Fowler, Journal of computational biology . 2018,第7期

机译：BBK *（K *上的分支和边界）：一种可验证且有效的基于集合的蛋白质设计算法，可优化大序列空间上的稳定性和结合亲和力
3. BBK* (Branch and Bound Over K*): A Provable and Efficient Ensemble-Based Protein Design Algorithm to Optimize Stability and Binding Affinity Over Large Sequence Spaces [J] . Ojewole Adegoke A., Jou Jonathan D., Fowler Vance G., Journal of computational biology: A journal of computational molecular cell biology . 2018,第7期

机译：BBK *（分支和k *绑定）：一种可提供高效的基于合奏的蛋白质设计算法，可优化在大序列空间上的稳定性和结合亲和力
4. BBK~* (Branch and Bound over K*): A Provable and Efficient Ensemble-Based Algorithm to Optimize Stability and Binding Affinity over Large Sequence Spaces [C] . Adegoke A. Ojewole, Jonathan D. Jou, Vance G. Fowler, Annual international conference on research in computational molecular biology . 2017

机译：BBK〜*（K *上的分支和边界）：一种可验证且有效的基于集合的算法，可在较大序列空间上优化稳定性和绑定亲和力
5. Theory and Algorithms for Dynamic and Adaptive Online Learning. [D] . Yang, Scott. 2017

机译：动态和自适应在线学习的理论和算法。
6. BBK* (Branch and Bound Over K*): A Provable and Efficient Ensemble-Based Protein Design Algorithm to Optimize Stability and Binding Affinity Over Large Sequence Spaces [O] . Adegoke A. Ojewole, Jonathan D. Jou, Vance G. Fowler, -1

机译：BBK *（在K *上分支和绑定）：一种可验证且有效的基于集合体的蛋白质设计算法可优化大序列空间上的稳定性和结合亲和力
7. Novel, provable algorithms for efficient ensemble-based computational protein design and their application to the redesign of the c-Raf-RBD:KRas protein-protein interface [O] . Anna U. Lowegard, Marcel S. Frenkel, Jonathan D. Jou, 2019

机译：基于高效的基于组合的计算蛋白设计及其应用于C-RAF-RBD的重新设计的新颖，可提供的算法：KRAS蛋白蛋白界面
8. Interactive Algorithms for Unsupervised Machine Learning. [R] . Krishnamurthy, A. 2015

机译：无监督机器学习的交互式算法。

Algorithmic stability and ensemble-based learning.

摘要

著录项

相似文献

相关主题

期刊订阅