首页> 外文期刊>International journal of parallel programming >A Cluster-Based Data-Centric Model for Network-Aware Task Scheduling in Distributed Systems
【24h】

A Cluster-Based Data-Centric Model for Network-Aware Task Scheduling in Distributed Systems

机译:分布式系统中基于网络的任务调度的基于集群的数据中心模型

获取原文
获取原文并翻译 | 示例

摘要

Big Data processing architectures are now widely recognized as one of the most significant innovations in Computing in the last decade. Their enormous potential in collecting and processing huge volumes of data scattered throughout the Internet is opening the door to a new generation of fully distributed applications that, by leveraging the large amount of resources available on the network will be able to cope with very complex problems achieving performances never seen before. However, the Internet is known to have severe scalability limitations in moving very large quantities of data, and such limitations introduce the challenge of making efficient use of the computing and storage resources available on the network, in order to enable data-intensive applications to be executed effectively in such a complex distributed environment. This implies resource scheduling decisions which drive the execution of task towards the data by taking network load and capacity into consideration to maximize data access performance and reduce queueing and processing delays as possible. Accordingly, this work presents a data-centric meta-scheduling scheme for fully distributed Big Data processing architectures based on clustering techniques whose goal is aggregating tasks around storage repositories and driven by a new concept of "gravitational" attraction between the tasks and their data of interest. This scheme will benefit from heuristic criteria based on network awareness and advance resource reservation in order to suppress long delays in data transfer operations and result into an optimized use of data storage and runtime resources at the expense of a limited (polynomial) computational complexity.
机译:大数据处理架构现在被广泛认为是过去十年中计算机领域最重要的创新之一。它们在收集和处理分散在整个Internet上的大量数据方面的巨大潜力为新一代完全分布式应用程序打开了大门,这些应用程序通过利用网络上可用的大量资源,将能够解决非常复杂的问题,从而实现从未见过的表演。但是,众所周知,Internet在移动大量数据时具有严重的可伸缩性限制,而这些限制带来了挑战,即如何有效利用网络上可用的计算和存储资源,从而使数据密集型应用程序成为可能。在如此复杂的分布式环境中有效执行。这意味着资源调度决策,通过考虑网络负载和容量来最大化数据访问性能并尽可能减少排队和处理延迟,从而推动任务朝着数据执行。因此,这项工作提出了一种基于聚类技术的完全分布式大数据处理体系结构的以数据为中心的元调度方案,该聚类技术的目标是在存储库周围聚集任务,并由任务及其数据之间的“引力”吸引新概念驱动利益。该方案将受益于基于网络意识的启发式标准并提前进行资源预留,以抑制数据传输操作中的长时间延迟,并以有限的(多项式)计算复杂度为代价,优化使用数据存储和运行时资源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号