首页> 外文会议>International conference on cloud computing and security >Big Data Equi-Join Optimization Algorithms on Spark Cloud Computing Platform
【24h】

Big Data Equi-Join Optimization Algorithms on Spark Cloud Computing Platform

机译:Spark云计算平台上的大数据等参优化算法

获取原文

摘要

On Spark cloud computing platform, the conventional big data equi-join algorithms cannot meet the performance requirements well and the procedure of it is very time-consuming, so the efficiency of big data equi-join is a burning challenge. To overcome it, in this paper, we propose Compressed Bloom Filter Join algorithm, an efficient algorithm filters out most of invalid connections which cannot meet the criteria to reduce network overhead, and it constructs static one-dimensional bit array to improve join performance. Moreover, Compressed Bloom Filter Join Extension algorithm, an extended optimization based on Compressed Bloom Filter Join algorithm, produces a dynamic two-dimensional bit array to filter out invalid records, and it can further accelerate the process of data join when the data size is unknown. Experimental results show that the performance of two optimization algorithms which can reduce time consumption and the data size of Shuffle stage are better than Hash Join and Broadcast Join on Spark cloud computing platform.
机译:在Spark云计算平台上,传统的大数据等值连接算法不能很好地满足性能要求,并且其过程非常耗时,因此大数据等值连接的效率是一个迫在眉睫的挑战。为了解决这个问题,本文提出了压缩布隆过滤器加入算法,一种有效的算法可以过滤掉大多数不符合标准的无效连接以减少网络开销,并构造静态的一维位数组来提高加入性能。此外,压缩布鲁姆过滤器联接扩展算法是基于压缩布鲁姆过滤器联接算法的扩展优化,可生成动态二维位数组以过滤掉无效记录,并且在数据大小未知时可以进一步加速数据联接的过程。 。实验结果表明,在Spark云计算平台上,两种可以减少时间消耗和Shuffle阶段数据大小的优化算法的性能均优于Hash Join和Broadcast Join。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号