Large Margin vs. Large Volume in TransductiveLearning

机译：跨语言学习中的大利润与大数量

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We focus on distribution-free transductive learning. In this setting the learning algorithm is given a 'full sample' of unlabeled points. Then, a training sample is selected uniformly at random from the full sample and the labels of the training points are revealed. The goal is to predict the labels of the remaining unlabeled points as accurately as possible. The full sample partitions the transductive hypothesis space into a finite number of equivalence classes. All hypotheses in the same equivalence class, generate the same dichotomy of the full sample. We consider a large volume principle, whereby the priority of each equivalence class is proportional to its "volume" in the hypothesis space. The large volume principle was previously treated for the case of hyperplanes. In this paper, instead of hyperplanes, we consider soft classification vectors whose set of equivalence classes w.r.t. the full sample contains all possible dichotomies. Symmetry is broken by generating equivalence classes of non-uniform volume, defined via a non axis aligned data-dependent ellipsoid. Since exact or quantifiable approximate volume estimation is computationally hard, we resort to a cruder approach whereby volume is crudely related to the angles between hypotheses and the principal axes of the ellipsoid. This approach makes sense because long principal axes lie in regions of large volume. Our construction leads to a family of transductive algorithms and here we focus on one instantiation. Although the resulting algorithm is defined in terms of a non-convex optimization problem, we develop an efficient global optimum solution using a known technique. We also derive a data-dependent error bound for this algorithm. Our experiments with the new Approximate Volume Regularization (AVR) algorithm over 31 datasets show its overwhelming advantage over TSVM and SVM in text categorization and image classification. However, on a different set of UCI datasets, TSVM and SVM are significantly superior to AVR. We identify some factors that influence the success and failure of our algorithm. One interesting observation is that AVR has significant advantage over TSVM when TSVM outperforms SVM, and vice versa.

机译：我们专注于无分布的跨语言学习。在这种设置下，学习算法将获得未标记点的“完整样本”。然后，从全部样本中随机均匀地选择训练样本，并显示训练点的标签。目的是尽可能准确地预测其余未标记点的标记。完整样本将转导假设空间划分为有限数量的等价类。同一等价类中的所有假设都对整个样本产生相同的二分法。我们考虑一个大容量原理，其中每个等价类的优先级与其在假设空间中的“体积”成正比。大体积原理先前已针对超平面情况进行了处理。在本文中，我们将使用等价类集为w.r.t的软分类矢量代替超平面。完整的样本包含所有可能的二分法。通过生成非均匀体积的等价类（通过非轴对齐的数据相关椭球体定义），打破了对称性。由于精确或可量化的近似体积估计在计算上比较困难，因此我们采用一种较粗略的方法，其中体积与假设和椭圆形主轴之间的角度粗略相关。这种方法很有意义，因为长主轴位于大体积区域中。我们的构造导致了一系列转导算法，在这里我们集中于一个实例化。尽管根据非凸优化问题定义了所得算法，但我们使用已知技术开发了有效的全局最优解。我们还为该算法导出了与数据相关的错误界限。我们对31个数据集使用新的近似体积正则化（AVR）算法进行的实验表明，在文本分类和图像分类方面，它比TSVM和SVM具有压倒性的优势。但是，在一组不同的UCI数据集上，TSVM和SVM明显优于AVR。我们确定了一些影响算法成功和失败的因素。一个有趣的观察结果是，当TSVM优于SVM时，AVR优于TSVM，反之亦然。

著录项

来源
《European Conference on Machine Learning and Knowledge Discovery in Databases;ECML PKDD 2008》|2008年|P.9-10|共2页
会议地点
作者
Ran El-Yaniv; Dmitry Pechyony; Vladimir Vapnik;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;TP11.13;
关键词

相似文献

外文文献
中文文献
专利

1. MARGIN VS. VOLUME [J] . JEFF SIEGEL Kentucky beverage journal . 2015,第3期

机译：马尔金VS。体积
2. Large margin vs. large volume in transductive learning [J] . Ran El-Yaniv, Dmitry Pechyony, Vladimir Vapnik Machine Learning . 2008,第3期

机译：跨性别学习的大幅度vs.大批量
3. Target volume margins for lung cancer: Internal target volume/clinical target volume [Marges dans le cancer pulmonaire: Volume cible interne/volume cible anatomoclinique] [J] . JouinA., PourelN. Cancer radiotherapie: journal de la Soci閠?fran鏰ise de radiotherapie oncologique . 2013,第5a6期

机译：肺癌的目标量余量：内部目标量/临床目标量[肺癌的余量：内部目标量/临床目标量]
4. Large Margin vs. Large Volume in TransductiveLearning [C] . Ran El-Yaniv, Dmitry Pechyony, Vladimir Vapnik European Conference on Machine Learning and Knowledge Discovery in Databases . 2008

机译：大幅与跨越式跨度卷
5. Fat and Caloric Content of Breast Milk of Mothers of Premature Infants: Comparison of High vs. Low Volume Producers and the Impact of Volume in Single Pumping Sessions and the Interval Between Pumping Sessions [D] . Haase, Barbara. 2018

机译：早产儿母乳的脂肪和热量含量：高与低批量生产商的比较和单次泵送会话中的影响和泵季间的间隔
6. Impact of Margin Assessment Method on Positive Margin Rate and Total Volume Excised [O] . Tracy-Ann Moo, Lydia Choi, Candice Culpepper, -1

机译：保证金评估方法对正保证金率和总交易量的影响
7. Large Margin vs. Large Volume in Transductive Learning [O] . Ran El-yaniv, Dmitry Pechyony, Vladimir Vapnik 2010

机译：转换学习中的大幅度与大批量
8. Pleistocene Margin Stratigraphy, New Jersey vs. Northern California: A STRATAFORM Study of Contrasts. [R] . Mountain, G., Buhl, P. 1999

机译：更新世边缘地层学，新泽西州与北加利福尼亚州：对比的sTRaTaFORm研究。

Large Margin vs. Large Volume in TransductiveLearning

摘要

著录项

相似文献

相关主题

期刊订阅