Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams

Ashkan Norouzi-Fard; Jakub Tarnawski; Slobodan Mitrovic; Amir Zandieh; Aidasadat Mousavifar; Ola Svensson

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams

【24h】

Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams

机译：超出1/2逼近的海量数据流上的子模最大化

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive over time at a fast pace and thus we need to design an efficient, low-memory algorithm. One such method, proposed by Badanidiyuru et al. (2014), always finds a 0.5-approximate solution. Can this approximation factor be improved? We answer this question affirmatively by designing a new algorithm Salsa for streaming submodular maximization. It is the first low-memory, singlepass algorithm that improves the factor 0.5, under the natural assumption that elements arrive in a random order. We also show that this assumption is necessary, i.e., that there is no such algorithm with better than 0.5-approximation when elements arrive in arbitrary order. Our experiments demonstrate that Salsa significantly outperforms the state of the art in applications related to exemplar-based clustering, social graph analysis, and recommender systems.

机译：机器学习和数据挖掘中的许多任务，例如数据多样化，非参数学习，内核机器，集群等，都需要从大量数据集中提取少量但具有代表性的摘要。通常，这样的问题可以被提出为最大化受基数约束的子模集函数。我们在流设置中考虑了这个问题，流中元素随时间快速到达，因此我们需要设计一种高效的低内存算法。 Badanidiyuru等人提出的一种这样的方法。（2014），总是找到0.5的近似解。这个近似系数可以改善吗？我们通过设计用于流式亚模最大化的新算法Salsa肯定地回答了这个问题。在自然假设元素按随机顺序到达的情况下，这是第一个将因数提高0.5的低内存单遍算法。我们还表明该假设是必要的，即当元素以任意顺序到达时，没有这样的算法具有比0.5更好的近似性。我们的实验表明，Salsa在与基于示例的聚类，社交图分析和推荐系统相关的应用程序中，其性能明显优于现有技术。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第2010期|共10页
作者
Ashkan Norouzi-Fard; Jakub Tarnawski; Slobodan Mitrovic; Amir Zandieh; Aidasadat Mousavifar; Ola Svensson;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams [J] . Ashkan Norouzi-Fard, Jakub Tarnawski, Slobodan Mitrovic, JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：超出1/2逼近的海量数据流上的子模最大化
2. Non-submodular maximization on massive data streams [J] . Journal of Global Optimization . 2020,第4期

机译：海量数据流上的非亚模最大化
3. Streaming Submodular Maximization: Massive Data Summarization on the Fly [J] . Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, SIGKDD explorations . 2014,第CDaROM期

机译：流式亚模最大化：即时进行海量数据汇总
4. The Power of Randomization: Distributed Submodular Maximization on Massive Datasets [C] . Rafael Barbosa, Alina Ene, Huy Le Nguyen, International Conference on Machine Learning . 2016

机译：随机化的力量：大规模数据集上的分布式子模块最大化
5. Combinatorial Optimization on Massive Datasets: Streaming, Distributed, and Massively Parallel Computation [D] . Assadi, Sepehr. 2018

机译：大规模数据集的组合优化：流式，分布式和大规模并行计算
6. Mining Massive E-Health Data Streams for IoMT Enabled Healthcare Systems [O] . Affan Ahmed Toor, Muhammad Usman, Farah Younas, 2020

机译：为支持IoMT的医疗系统挖掘海量电子医疗数据流
7. The one-way communication complexity of submodular maximization with applications to streaming and robustness [O] . Moran Feldman, Ashkan Norouzi-Fard, Ola Svensson, 2020

机译：利用应用于流传输和鲁棒性的子模块最大化的单向通信复杂性

Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams

摘要

著录项

相似文献

相关主题

期刊订阅