首页> 外国专利> Automated load-balancing of partitions in arbitrarily imbalanced distributed mapreduce computations

Automated load-balancing of partitions in arbitrarily imbalanced distributed mapreduce computations

机译:任意不平衡分布式mapreduce计算中分区的自动负载均衡

摘要

A distributed computing system executes a MapReduce job on streamed data that includes an arbitrary amount of imbalance with respect to the frequency distribution of the data keys in the dataset. A map task module maps the dataset to a coarse partitioning, and generates a list of the top K keys with the highest frequency among the dataset. A sort task module employs a plurality of sorters to read the coarse partitioning and sort the data into buckets by data key. The values for the top K most frequent keys are separated into single-key buckets. The other less frequently occurring keys are assigned to buckets that each have multiple keys assigned to it. Then, more than one worker is assigned to each single-key bucket. The output of the multiple workers assigned to each respective single-key bucket is stitched together.
机译:分布式计算系统在流数据上执行MapReduce作业,该数据包括关于数据集中数据键的频率分布的任意不平衡量。映射任务模块将数据集映射到粗略分区,并生成数据集中频率最高的前K个键的列表。分类任务模块采用多个分类器读取粗略分区,并通过数据键将数据分类到存储桶中。前K个最常用的键的值分为单个键存储桶。其他不经常出现的密钥被分配给存储桶,每个存储桶都分配了多个密钥。然后,将多个工人分配给每个单键存储桶。分配给每个相应单键存储桶的多个工作程序的输出被缝合在一起。

著录项

  • 公开/公告号US9613127B1

    专利类型

  • 公开/公告日2017-04-04

    原文格式PDF

  • 申请/专利权人 QUANTCAST CORPORATION;

    申请/专利号US201414320373

  • 发明设计人 SILVIUS V. RUS;WEI JIANG;

    申请日2014-06-30

  • 分类号G06F17/30;

  • 国家 US

  • 入库时间 2022-08-21 13:41:38

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号