...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Distributed Maximization of 'Submodular plus Diversity' Functions for Multi-label Feature Selection on Huge Datasets
【24h】

Distributed Maximization of 'Submodular plus Diversity' Functions for Multi-label Feature Selection on Huge Datasets

机译:大型数据集上多标签特征选择的“亚模加分集”功能的分布式最大化

获取原文

摘要

There are many problems in machine learning and data mining which are equivalent to selecting a non-redundant, high "quality" set of objects. Recommender systems, feature selection, and data summarization are among many applications of this. In this paper, we consider this problem as an optimization problem that seeks to maximize the sum of a sum-sum diversity function and a non-negative monotone submodular function. The diversity function addresses the redundancy, and the submodular function controls the predictive quality. We consider the problem in big data settings (in other words, distributed and streaming settings) where the data cannot be stored on a single machine or the process time is too high for a single machine. We show that a greedy algorithm achieves a constant factor approximation of the optimal solution in these settings. Moreover, we formulate the multi-label feature selection problem as such an optimization problem. This formulation combined with our algorithm leads to the first distributed multi-label feature selection method. We compare the performance of this method with centralized multi-label feature selection methods in the literature, and we show that its performance is comparable or in some cases is even better than current centralized multi-label feature selection methods.
机译:机器学习和数据挖掘中存在许多问题,这些问题等同于选择一组非冗余的高质量“对象”。推荐系统,功能选择和数据汇总是许多应用程序之一。在本文中,我们将此问题视为优化问题,以求求和和分集函数与非负单调子模函数的和最大化。分集功能解决了冗余问题,而子模块功能控制了预测质量。我们考虑大数据设置(换句话说,分布式和流设置)中的问题,因为数据无法存储在单台机器上,或者处理时间对于单台机器来说太长。我们表明,在这些设置下,贪心算法可实现最优解的恒定因子近似。此外,我们将多标签特征选择问题表述为这种优化问题。这种公式与我们的算法相结合,导致了第一个分布式多标签特征选择方法。我们在文献中将这种方法与集中式多标签特征选择方法的性能进行了比较,结果表明它的性能与当前的集中式多标签特征选择方法相当,甚至在某些情况下甚至更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号