首页> 外文会议>Database systems for advanced applications >A Distributed Load Balance Algorithm of MapReduce for Data Quality Detection
【24h】

A Distributed Load Balance Algorithm of MapReduce for Data Quality Detection

机译:一种用于数据质量检测的MapReduce分布式负载均衡算法

获取原文
获取原文并翻译 | 示例

摘要

Big data quality detection is a valuable problem in data quality field. MapReduce is an important distributed data processing model mainly for big data processing. Load balance is a key factor that influences the property of MapReduce. In this paper, we propose a distributed greedy approximation algorithm for load balance problem in MapReduce for data quality detection. There are three key challenges: (a) reduce the problem to NP-complete and prove a considerable approximation ratio of the proposed algorithm, (b) just impose one more round of MapReduce than conventional processing and occupy minimal time in the total process, (c) be simple and convenient feasible. Experimental results on real-life and synthetic data demonstrate that the proposed algorithm in this paper is effective for load balance.
机译:大数据质量检测是数据质量领域的重要问题。 MapReduce是重要的分布式数据处理模型,主要用于大数据处理。负载平衡是影响MapReduce属性的关键因素。本文针对MapReduce中的负载均衡问题,提出了一种分布式贪婪近似算法,用于数据质量检测。存在三个主要挑战:(a)将问题简化为NP完全并证明所提出算法的近似率;(b)仅比传统处理强加一轮MapReduce并在整个过程中占用最少的时间,( c)简单方便可行。真实和综合数据的实验结果表明,本文提出的算法对于负载均衡是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号