首页> 外文会议>Annual conference on Neural Information Processing Systems >Sparse Local Embeddings for Extreme Multi-label Classification
【24h】

Sparse Local Embeddings for Extreme Multi-label Classification

机译:极端多标签分类的稀疏局部嵌入

获取原文

摘要

The objective in extreme multi-label learning is to train a classifier that can automatically tag a novel data point with the most relevant subset of labels from an extremely large label set. Embedding based approaches attempt to make training and prediction tractable by assuming that the training label matrix is low-rank and reducing the effective number of labels by projecting the high dimensional label vectors onto a low dimensional linear subspace. Still, leading embedding approaches have been unable to deliver high prediction accuracies, or scale to large problems as the low rank assumption is violated in most real world applications. In this paper we develop the SLEEC classifier to address both limitations. The main technical contribution in SLEEC is a formulation for learning a small ensemble of local distance preserving embeddings which can accurately predict infrequently occurring (tail) labels. This allows SLEEC to break free of the traditional low-rank assumption and boost classification accuracy by learning embeddings which preserve pairwise distances between only the nearest label vectors. We conducted extensive experiments on several real-world, as well as benchmark data sets and compared our method against state-of-the-art methods for extreme multi-label classification. Experiments reveal that SLEEC can make significantly more accurate predictions then the state-of-the-art methods including both embedding-based (by as much as 35%) as well as tree-based (by as much as 6%) methods. SLEEC can also scale efficiently to data sets with a million labels which are beyond the pale of leading embedding methods.
机译:极端多标签学习的目的是训练一个分类器,该分类器可以使用来自极大标签集的标签中最相关的子集自动标记一个新的数据点。基于嵌入的方法尝试通过假设训练标签矩阵为低秩,并通过将高维标签向量投影到低维线性子空间上来减少标签的有效数量,来使训练和预测变得易于处理。但是,由于在大多数现实应用中违反了低秩假设,因此领先的嵌入方法仍无法提供较高的预测精度,也无法扩展至较大的问题。在本文中,我们开发了SLEEC分类器来解决这两个局限性。 SLEEC的主要技术贡献是用于学习少量局部距离保留嵌入的组合,该组合可以准确预测不经常出现的(尾部)标签。这使得SLEEC可以摆脱传统的低秩假设,并通过学习仅保留最接近的标记向量之间的成对距离的嵌入来提高分类准确性。我们在几个真实世界以及基准数据集上进行了广泛的实验,并将我们的方法与用于极端多标签分类的最新方法进行了比较。实验表明,SLEEC可以比最先进的方法(包括基于嵌入的方法(多达35%)和基于树的方法(多达6%))做出准确得多的预测。 SLEEC还可以有效地扩展到具有一百万个标签的数据集,这超出了领先的嵌入方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号