An Algorithm of Data Skew in Spark Based on Partition

机译：基于分区的火花数据偏差算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

To solve the problem of data skew, many algorithms have been proposed at present. Due to different operating mechanisms, many advantages of hadoop-based algorithms cannot be fully realized in spark. However, most proposed algorithms are hadoop-based. Tang zhuo et al. proposed SKRSP, an adaptive partitioning method to deal with data skew in spark application. Compared with previous researches, this algorithm can more effectively alleviate the problems of data skew. Moreover, with the increase of data skew, the effect of this algorithm to deal with data skew is more and more significant. However, the research of this algorithm is based on the same hardware and software configuration of the nodes in the cluster. This paper presents a load balancing and key redistribution algorithm based on Spark (LBKRS) which optimizes the SKRSP algorithm from the point of view of load balancing. By monitoring the CPU utilization, memory utilization and other information of the calculation nodes, the LBKRS algorithm has a better effect on the data skew of different configuration nodes and is more adaptable to the actual production situation.

机译：为了解决数据偏差问题，目前已经提出了许多算法。由于不同的操作机制，基于Hadoop的算法的许多优点不能以火花充分实现。然而，大多数所提出的算法是基于Hadoop的。唐卓等人。提出的SKRSP，一种自适应分区方法，用于处理Spark应用程序中的数据偏差。与以前的研究相比，该算法可以更有效地减轻数据偏斜的问题。此外，随着数据偏差的增加，该算法处理数据偏差的效果越来越重要。然而，该算法的研究基于集群中的节点的相同硬件和软件配置。本文提出了一种基于火花（LBKR）的负载平衡和键再分配算法，从负载均衡的角度来看，优化SKRSP算法。通过监视CPU利用率，存储器利用率和计算节点的其他信息，LBKR算法对不同配置节点的数据偏差具有更好的影响，并且更适应实际生产情况。

著录项

来源
《International Conference on Computers, Information Processing and Advanced Education》|2020年|217-222|共6页
会议地点
作者
Shi Xiujin; Qian Yueqin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Software algorithms; Clustering algorithms; Load management; Software; Partitioning algorithms; Sparks; Task analysis;

机译：软件算法;聚类算法;负载管理;软件;分区算法;火花;任务分析;
入库时间 2022-08-26 13:55:26

相似文献

外文文献
中文文献
专利

1. SP-Partitioner: A novel partition method to handle intermediate data skew in spark streaming [J] . Guipeng Liu, Xiaomin Zhu, Ji Wang, Future generation computer systems . 2018,第SEPa期

机译：SP-Partitioner：一种新颖的分区方法，用于处理火花流中的中间数据偏斜
2. ImRP: A Predictive Partition Method for Data Skew Alleviation in Spark Streaming Environment [J] . Fu Zhongming, Tang Zhuo, Yang Li, Parallel Computing . 2020,第Deca期

机译：IMRP：火花流环境中数据偏斜的预测分区方法
3. Learning automata-based algorithms for MapReduce data skewness handling [J] . Irandoost Mohammad Amin, Rahmani Amir Masoud, Setayeshi Saeed Journal of supercomputing . 2019,第10期

机译：学习基于自动机的MapReduce数据偏度处理算法
4. An Adaptive Partition Method for Handling Skew in Spark Applications [C] . Wei Lv, Zhuo Tang, Kenli Li, IEEE SmartWorld Conference;Ubiquitous Intelligence Computing Conference;Advanced Trusted Computed Conference;Scalable Computing Communications Conference;Cloud Big Data Computing Conference;Internet of People Conference;Smart City Innovation Conference . 2018

机译：一种在Spark应用程序中处理偏斜的自适应分区方法
5. Using Statistical Analysis to Improve Data Partitioning in Algorithms for Data Parallel Processing Implementation [D] . Hidalgo Murillo, Manuel E. 2016

机译：在数据并行处理实现算法中，使用统计分析来改善数据划分
6. Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions [O] . E Andres Houseman, Brock C Christensen, Ru-Fang Yeh, 2008

机译：DNA甲基化阵列数据的基于模型的聚类：针对β分布混合出现的高维数据的递归划分算法
7. New Physics-Based Turbocharger Data-Maps Extrapolation Algorithms: Validation on a Spark-Ignited Engine [O] . El Hadef, Jamil, Colin, Guillaume, Talon, Vincent, 2012

机译：新的基于物理的涡轮增压器数据映射外推算法：火花点火发动机的验证
8. Analysis of Algorithms Predicting Blood: Air and Tissue: Blood Partition Coefficient from Solvent Partition Coefficients for Use in Complex Mixture Physiological Based Pharmacokinetic/Pharmacodynamic Modeling [R] . Sterner, T. R. , Robinson, P. J. , Mattie, D. R. , 2004

机译：预测血液的算法分析：空气和组织：用于复杂混合物的溶剂分配系数的血液分配系数基于生理学的药代动力学/药效学模型

An Algorithm of Data Skew in Spark Based on Partition

摘要

著录项

相似文献

相关主题

期刊订阅