首页> 外文期刊>Journal of supercomputing >Learning automata-based algorithms for MapReduce data skewness handling
【24h】

Learning automata-based algorithms for MapReduce data skewness handling

机译:学习基于自动机的MapReduce数据偏度处理算法

获取原文
获取原文并翻译 | 示例
           

摘要

One of the most successful techniques for large-scale data processing is MapReduce. However, the performance of this technique is significantly reduced when there is skewness in data. The hash function is the default partitioner in Big Data frameworks such as Hadoop and Spark. Hash works perfectly when there is no data skewness, which is not the case in natural events. In this paper, we proposed two new algorithms, namely learning automata partitioner (LAP) and traffic cost-aware partitioner (TCAP) based on learning automata for handling reducer-side data skewness in MapReduce applications. LAP is based on clusters combination and performs well when data skewness degree is low. TCAP, on the other hand, has the advantage of considering network topology and balancing network traffic cost in the shuffling phase. TCAP supports cluster splitting and performs well in any data skewness degree. LAP and TCAP can also be used in heterogeneous environments. The performance of our algorithms was evaluated by several experiments and simulations by well-known benchmarks. The results confirmed that our algorithms performed better than other similar algorithms in most cases.
机译:MapReduce是用于大规模数据处理的最成功技术之一。但是,当数据存在偏斜时,此技术的性能会大大降低。哈希函数是Hadoop和Spark等大数据框架中的默认分区程序。在没有数据偏斜的情况下(在自然事件中情况并非如此),哈希可以完美工作。在本文中,我们提出了两种新算法,即学习自动机分区器(LAP)和基于学习自动机的流量成本感知分区器(TCAP),用于处理MapReduce应用程序中的归约器侧数据偏斜。 LAP基于群集组合,并且在数据偏斜度较低时表现良好。另一方面,TCAP具有在改组阶段考虑网络拓扑并平衡网络流量成本的优势。 TCAP支持群集拆分,并且在任何数据偏斜度方面均表现出色。 LAP和TCAP也可以在异构环境中使用。我们的算法的性能通过众所周知的基准测试和模拟实验进行了评估。结果证实,在大多数情况下,我们的算法比其他类似算法表现更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号