【24h】

Research on distributed data skew join algorithm based on VGFR model

机译:基于VGFR模型的分布式数据偏斜连接算法研究

获取原文
获取原文并翻译 | 示例

摘要

The join operation is one of the core content of complex analysis tasks dealing with massive data. How to dealing with the data skew occurred in the join process effectively is a big challenge to us. But join algorithm based on the fragment and replicate in the current Shared-Nothing distributed architecture, which on the one hand, introduced a large data transmission overhead when a large amount of data is used to copy, the other hand, can not be adjusted dynamically according to the real-time load. In this paper, we propose and implement a grouping fragment and replicate join strategy based on virtual node (VGFR), aimed at robustness in terms of the size of both join sides. at the same time, the mapping relationship between virtual node and actual node is dynamically determined according to the system's real-time load, which can not only solve the problem of data skew effectively but also guarantee load balancing capability of the fine grained. Experimental results demonstrate that the optimizationThe join operation is one of the core content of complex analysis tasks dealing with massive data. How to dealing with the data skew occurred in the join process effectively is a big challenge to us. But join algorithm based on the fragment and replicate in the current Shared-Nothing distributed architecture, which on the one hand, introduced a large data transmission overhead when a large amount of data is used to copy, the other hand, can not be adjusted dynamically according to the real-time load. In this paper, we propose and implement a grouping fragment and replicate join strategy based on virtual node (VGFR), aimed at robustness in terms of the size of both join sides. at the same time, the mapping relationship between virtual node and actual node is dynamically determined according to the system's real-time load, which can not only solve the problem of data skew effectively but also guarantee load balancing capability of the fine grained. Experimental results demonstrate that the optimization technique can improve the performance of join operation effectively and has the adaptability. technique can improve the performance of join operation effectively and has the adaptability.
机译:连接操作是处理海量数据的复杂分析任务的核心内容之一。如何有效地处理联接过程中发生的数据倾斜是我们面临的巨大挑战。但是在当前的Shared-Nothing分布式体系结构中,基于片段的连接算法并进行复制,一方面,当使用大量数据进行复制时,引入了较大的数据传输开销;另一方面,无法动态调整根据实时负载。在本文中,我们提出并实现了一种基于虚拟节点(VGFR)的分组片段和复制联接策略,旨在针对两个联接面的大小提供鲁棒性。同时根据系统的实时负载动态确定虚拟节点与实际节点之间的映射关系,不仅可以有效解决数据偏斜的问题,而且可以保证细粒度的负载均衡能力。实验结果表明,优化联接操作是处理海量数据的复杂分析任务的核心内容之一。如何有效地处理联接过程中发生的数据倾斜是我们面临的巨大挑战。但是在当前的Shared-Nothing分布式体系结构中,基于片段的连接算法并进行复制,一方面,当使用大量数据进行复制时,引入了较大的数据传输开销;另一方面,无法动态调整根据实时负载。在本文中,我们提出并实现了一种基于虚拟节点(VGFR)的分组片段和复制联接策略,旨在针对两个联接面的大小提供鲁棒性。同时根据系统的实时负载动态确定虚拟节点与实际节点之间的映射关系,不仅可以有效解决数据偏斜的问题,而且可以保证细粒度的负载均衡能力。实验结果表明,该优化技术可以有效提高连接操作的性能,具有适应性。技术可以有效地提高连接操作的性能并具有适应性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号