Efficient Collaborative Approximation in MapReduce Without Missing Rare Keys

机译：MapReduce中的高效协同近似而不会丢失稀有键

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent proposals extend MapReduce, a widely-used Big Data processing framework, with sampling to improve performance by producing approximate results with statistical error bounds. However, because these systems perform global uniform sampling across the entire key space of input data, they may completely miss rare keys which may be unacceptable in some applications. Well-known stratified sampling avoids missing rare keys by obtaining the same number of samples for each key which also achieves good performance by sampling popular keys infrequently and rare keys more often. While online stratified sampling has been done in centralized settings, straightforward extension to MapReduce's distributed setting cannot easily leverage the number of per-key samples seen globally by all the Mappers to reduce the sampling rate of each Mapper in the future. Because there are hundreds of Mappers in a typical MapReduce job, such feedback can drastically reduce oversampling and improve performance. We present MaDSOS (MapReduce with Distributed Stratified Online Sampling) which makes two contributions: (1) Instead of a fixed n per-key samples and the resultant sampling rates, we propose a telescoping algorithm that uses fixed sampling rates of the form 1/2~k and, between n and 2n samples. (2) We propose a collaborative feedback scheme, that is enabled by the specific form of sampling rates and the leniency in the sample counts, to efficiently cut the sampling rates, and thus oversampling, once the desired number of samples have been seen globally. For our MapReduce benchmarks, MaDSOS improves performance by 59% over Hadoop while guaranteeing never to miss rare keys and achieves 2.5% per-key error compared to 100% worst-case error under global sampling at a fixed rate for all the keys.

机译：最近的建议扩展了MapReduce，广泛使用的大数据处理框架，采样通过产生统计误差界限的近似结果来提高性能。但是，由于这些系统在输入数据的整个关键空间中执行全球均匀采样，因此它们可能完全错过罕见的键，这在某些应用中可能是不可接受的。众所周知的分层采样通过获得相同数量的样品来避免缺少罕见的键，该钥匙也通过更频繁地采样流行钥匙和罕见的键来实现良好的性能。虽然在集中设置中已经完成了在线分层采样，但MapReduce的分布式设置的直接扩展不能轻易利用全部映射器全局看到的每次键样本的数量，以降低未来每个映射器的采样率。由于典型的MapReduce工作中有数百个映射器，因此此类反馈可以大大降低过采样并提高性能。我们展示了MadsoS（MapReduce与分布式的在线采样），这使得两种贡献：（1）而不是固定的N个每键样本和所得到的采样率，我们提出了一种伸缩算法，其使用FORM 1/2的固定采样率〜K，在N和2N样品之间。（2）我们提出了一种协作反馈方案，其通过采样率的特定形式和样本计数的宽大，以有效地切割采样率，从而过量采样，一旦全球所需数量的样本。对于我们的MapReduce基准，MadsoS通过Hadoop提高了59％的性能，同时保证从未错过稀有键并以每键在全局采样下的100％最差情况下以固定速率为100％的最坏情况错误而实现2.5％。

著录项

来源
《International Conference on Cloud and Autonomic Computing》|2017年|188p|共12页
会议地点
作者
Nitin; Mithuna Thottethodi; T. N. Vijaykumar; Milind Kulkarni;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词

相似文献

外文文献
中文文献
专利

1. An efficient parallel similarity matrix construction on MapReduce for collaborative filtering [J] . Kim Seunghee, Kim Hongyeon, Min Jun-Ki Journal of supercomputing . 2019,第1期

机译：用于协作过滤的MapReduce上高效的并行相似矩阵构造
2. MapReduce-based efficient betweenness approximation pivot method for large graphs [J] . Xiao Long Deng, Yu Xiao Li International journal of infomation technology and management . 2016,第2期

机译：基于MapReduce的大图有效中间近似枢轴方法
3. Efficient $k$ -Means++ Approximation with MapReduce [J] . Xu Y., Qu W., Li Z., Parallel and Distributed Systems, IEEE Transactions on . 2014,第12期

机译：有效的 $ k $ < / alternatives> -借助MapReduce的++近似
4. Efficient Collaborative Approximation in MapReduce Without Missing Rare Keys [C] . Nitin, Mithuna Thottethodi, T. N. Vijaykumar, International Conference on Cloud and Autonomic Computing . 2017

机译：MapReduce中的高效协同近似而不会丢失稀有键
5. Best rank-1 approximations without orthogonal invariance for the 1-norm [D] . Vasudevan, Varun A. 2016

机译：1-范数的无正交不变性的最佳秩1近似
6. Estimation and inference based on Neumann series approximation to locally efficient score in missing data problems [O] . HUA YUN CHEN -1

机译：基于Neumann系列近似对缺失数据问题的局部有效分数的估计和推断
7. Efficient k-means++ approximation with MapReduce [O] . Xu, Yujie, Qu, Wenyu, Li, Zhiyang, 2016

机译：使用MapReduce的高效k-means ++逼近

Efficient Collaborative Approximation in MapReduce Without Missing Rare Keys

摘要

著录项

相似文献

相关主题

期刊订阅