...
首页> 外文期刊>Neurocomputing >Escaping the curse of dimensionality in similarity learning: Efficient Frank-Wolfe algorithm and generalization bounds
【24h】

Escaping the curse of dimensionality in similarity learning: Efficient Frank-Wolfe algorithm and generalization bounds

机译:逃避相似性学习中的维度诅咒:高效的Frank-Wolfe算法和泛化边界

获取原文
获取原文并翻译 | 示例

摘要

Similarity and metric learning provides a principled approach to construct a task-specific similarity from weakly supervised data. However, these methods are subject to the curse of dimensionality: as the number of features grows large, poor generalization is to be expected and training becomes intractable due to high computational and memory costs. In this paper, we propose a similarity learning method that can efficiently deal with high-dimensional sparse data. This is achieved through a parameterization of similarity functions by convex combinations of sparse rank-one matrices, together with the use of a greedy approximate Frank-Wolfe algorithm which provides an efficient way to control the number of active features. We show that the convergence rate of the algorithm, as well as its time and memory complexity, are independent of the data dimension. We further provide a theoretical justification of our modeling choices through an analysis of the generalization error, which depends logarithmically on the sparsity of the solution rather than on the number of features. Our experiments on datasets with up to one million features demonstrate the ability of our approach to generalize well despite the high dimensionality as well as its superiority compared to several competing methods. (C) 2018 Elsevier B.V. All rights reserved.
机译:相似度和度量学习提供了一种从弱监督的数据构造特定于任务的相似度的原则方法。但是,这些方法会受到维度的诅咒:随着特征数量的增加,普遍的预期不佳,并且由于高昂的计算和存储成本,训练变得棘手。在本文中,我们提出了一种可以有效处理高维稀疏数据的相似性学习方法。这是通过稀疏秩一矩阵的凸组合对相似度函数进行参数化,以及使用贪婪近似Frank-Wolfe算法来实现的,该算法提供了一种有效的方式来控制活动特征的数量。我们表明,算法的收敛速度及其时间和内存复杂度与数据维无关。通过对泛化误差的分析,我们进一步为我们的建模选择提供了理论依据,而泛化误差在对数上取决于解决方案的稀疏性,而不取决于特征的数量。我们对具有多达一百万个特征的数据集进行的实验表明,尽管该方法具有很高的维数以及与几种竞争方法相比的优越性,但该方法能够很好地概括。 (C)2018 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号