【24h】

An Algorithm of Data Skew in Spark Based on Partition

机译:基于分区的火花数据偏差算法

获取原文

摘要

To solve the problem of data skew, many algorithms have been proposed at present. Due to different operating mechanisms, many advantages of hadoop-based algorithms cannot be fully realized in spark. However, most proposed algorithms are hadoop-based. Tang zhuo et al. proposed SKRSP, an adaptive partitioning method to deal with data skew in spark application. Compared with previous researches, this algorithm can more effectively alleviate the problems of data skew. Moreover, with the increase of data skew, the effect of this algorithm to deal with data skew is more and more significant. However, the research of this algorithm is based on the same hardware and software configuration of the nodes in the cluster. This paper presents a load balancing and key redistribution algorithm based on Spark (LBKRS) which optimizes the SKRSP algorithm from the point of view of load balancing. By monitoring the CPU utilization, memory utilization and other information of the calculation nodes, the LBKRS algorithm has a better effect on the data skew of different configuration nodes and is more adaptable to the actual production situation.
机译:为了解决数据偏差问题,目前已经提出了许多算法。由于不同的操作机制,基于Hadoop的算法的许多优点不能以火花充分实现。然而,大多数所提出的算法是基于Hadoop的。唐卓等人。提出的SKRSP,一种自适应分区方法,用于处理Spark应用程序中的数据偏差。与以前的研究相比,该算法可以更有效地减轻数据偏斜的问题。此外,随着数据偏差的增加,该算法处理数据偏差的效果越来越重要。然而,该算法的研究基于集群中的节点的相同硬件和软件配置。本文提出了一种基于火花(LBKR)的负载平衡和键再分配算法,从负载均衡的角度来看,优化SKRSP算法。通过监视CPU利用率,存储器利用率和计算节点的其他信息,LBKR算法对不同配置节点的数据偏差具有更好的影响,并且更适应实际生产情况。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号