首页> 外文OA文献 >Handling Data Skew in MapReduce Cluster by Using Partition Tuning
【2h】

Handling Data Skew in MapReduce Cluster by Using Partition Tuning

机译:使用Partition Tuning处理MapReduce集群中的数据偏斜

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years. The MapReduce programming model has been successfully used for big data analytics. However, data skew invariably occurs in big data analytics and seriously affects efficiency. To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH). In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew. The robustness and efficiency of the proposed algorithm were tested on a wide variety of simulated datasets and real healthcare datasets. The results showed that PTSH algorithm can handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the native Hadoop, Closer, and locality-aware and fairness-aware key partitioning (LEEN). We also found that the time needed for rule extraction can be reduced significantly by adopting the PTSH algorithm, since it is more suitable for association rule mining (ARM) on healthcare data.
机译:医疗保健行业产生了大量的数据,并分析了近年来成为一个重要问题。 MapReduce编程模型已成功用于大数据分析。然而,数据偏差总是发生在大数据分析中,并严重影响效率。为了克服MapReduce中的数据Skew问题,我们在过去提出了一种称为基于分区调整的偏斜处理(PTSH)的数据处理算法。与传统MapReduce模型中使用的单阶段分区策略相比,PTSH使用两级策略和分区调整方法来分散虚拟分区中的键值对,并在数据偏差时重新组合每个分区。在各种模拟数据集和真正的医疗数据集上测试了所提出的算法的鲁棒性和效率。结果表明,PTSH算法可以有效地处理MapReduce中的数据偏差,并改善MapReduce作业的性能与本机Hadoop,更近的Hadoop,更接近和局部感知和公平感知的关键分区(Leen)。我们还发现,通过采用PTSH算法,可以显着减少规则提取所需的时间,因为它更适合在医疗保健数据上的关联规则挖掘(ARM)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号