...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams
【24h】

Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams

机译:超出1/2逼近的海量数据流上的子模最大化

获取原文
           

摘要

Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive over time at a fast pace and thus we need to design an efficient, low-memory algorithm. One such method, proposed by Badanidiyuru et al. (2014), always finds a 0.5-approximate solution. Can this approximation factor be improved? We answer this question affirmatively by designing a new algorithm Salsa for streaming submodular maximization. It is the first low-memory, singlepass algorithm that improves the factor 0.5, under the natural assumption that elements arrive in a random order. We also show that this assumption is necessary, i.e., that there is no such algorithm with better than 0.5-approximation when elements arrive in arbitrary order. Our experiments demonstrate that Salsa significantly outperforms the state of the art in applications related to exemplar-based clustering, social graph analysis, and recommender systems.
机译:机器学习和数据挖掘中的许多任务,例如数据多样化,非参数学习,内核机器,集群等,都需要从大量数据集中提取少量但具有代表性的摘要。通常,这样的问题可以被提出为最大化受基数约束的子模集函数。我们在流设置中考虑了这个问题,流中元素随时间快速到达,因此我们需要设计一种高效的低内存算法。 Badanidiyuru等人提出的一种这样的方法。 (2014),总是找到0.5的近似解。这个近似系数可以改善吗?我们通过设计用于流式亚模最大化的新算法Salsa肯定地回答了这个问题。在自然假设元素按随机顺序到达的情况下,这是第一个将因数提高0.5的低内存单遍算法。我们还表明该假设是必要的,即当元素以任意顺序到达时,没有这样的算法具有比0.5更好的近似性。我们的实验表明,Salsa在与基于示例的聚类,社交图分析和推荐系统相关的应用程序中,其性能明显优于现有技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号