首页> 外文会议>IEEE International Conference on e-Science >Albatross: An efficient cloud-enabled task scheduling and execution framework using distributed message queues

【24h】

Albatross: An efficient cloud-enabled task scheduling and execution framework using distributed message queues

机译：Albatross：使用分布式消息队列的高效的基于云的任务调度和执行框架

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data Analytics has become very popular on large datasets in different organizations. It is inevitable to use distributed resources such as Clouds for Data Analytics and other types of data processing at larger scales. To effectively utilize all system resources, an efficient scheduler is needed, but the traditional resource managers and job schedulers are centralized and designed for larger batch jobs which are fewer in number. Frameworks such as Hadoop and Spark, which are mainly designed for Big Data analytics, have been able to allow for more diversity in job types to some extent. However, even these systems have centralized architectures and will not be able to perform well on large scales and under heavy task loads. Modern applications generate tasks at very high rates that can cause significant slowdowns on these frameworks. Additionally, over-decomposition has shown to be very useful in increasing the system utilization. In order to achieve high efficiency, scalability, and better system utilization, it is critical for a modern scheduler to be able to handle over-decomposition and run highly granular tasks. Further, to achieve high performance, Albatross is written in C/C++, which imposes a minimal overhead to the workload process as compared to languages like Java or Python. We propose Albatross, a task level scheduling and execution framework that uses a Distributed Message Queue (DMQ) for task distribution among its workers. Unlike most scheduling systems, Albatross uses a pulling approach as opposed to the common push approach. The former would let Albatross achieve a good load balancing and scalability. Furthermore, the framework has built in support for task execution dependency on workflows. Therefore, Albatross is able to run various types of workloads, including Data Analytics and HPC applications. Finally, Albatross provides data locality support. This allows the framework to achieve higher performance through minimizing the amount of unnecessary data movement on the network. Our evaluations show that Albatross outperforms Spark and Hadoop at larger scales and in the case of running higher granularity workloads.

机译：在不同组织中的大型数据集上，数据分析已变得非常流行。不可避免地会使用更大的分布式资源，例如将Clouds用于数据分析和其他类型的数据处理。为了有效利用所有系统资源，需要一个高效的调度程序，但是传统的资源管理器和作业调度程序是集中式的，并且设计用于数量较少的较大批处理作业。主要为大数据分析而设计的诸如Hadoop和Spark之类的框架已经能够在某种程度上允许作业类型的更多多样性。但是，即使这些系统具有集中式体系结构，也无法在大规模和繁重的任务负载下良好地执行。现代应用程序以很高的速率生成任务，这可能会导致这些框架的运行速度大大降低。此外，过度分解已显示出对提高系统利用率非常有用。为了实现高效率，可伸缩性和更好的系统利用率，对于现代调度程序而言，能够处理过度分解并运行高度精细的任务至关重要。此外，为了实现高性能，Albatross用C / C ++编写，与Java或Python之类的语言相比，这为工作负载过程带来了最小的开销。我们提出了Albatross，一种任务级别的调度和执行框架，该框架使用分布式消息队列（DMQ）在其工作人员之间分配任务。与大多数调度系统不同，信天翁使用拉动方法，而不是普通的推入方法。前者将使Albatross实现良好的负载平衡和可伸缩性。此外，该框架内置了对工作流中任务执行依赖性的支持。因此，信天翁能够运行各种类型的工作负载，包括数据分析和HPC应用程序。最后，信天翁提供数据本地性支持。这允许框架通过最小化网络上不必要的数据移动量来实现更高的性能。我们的评估表明，在运行更高粒度的工作负载的情况下，信天翁的性能要优于Spark和Hadoop。

著录项

来源
《IEEE International Conference on e-Science》|2016年|11-20|共10页
会议地点
作者
Iman Sadooghi; Geet Kumar; Ke Wang; Dongfang Zhao; Tonglin Li; Ioan Raicu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data analysis; Sparks; Job shop scheduling; Scalability; Servers;

机译：数据分析;火花;作业车间调度;可扩展性;服务器;

相似文献

外文文献
中文文献
专利

1. Scheduling of Computational Processes in Real-Time Distributed Systems with Uncertain Task Execution Times [J] . N. V. Kolesov, M. V. Tolmacheva, P. V. Yukhta Journal of Computer and Systems Sciences International . 2012,第5期

机译：具有不确定任务执行时间的实时分布式系统中的计算过程调度
2. An investigation in parallel execution of answer set programs on distributed memory platforms: Task sharing and dynamic scheduling [J] . Enrico Pontelli, Hung Viet Le, Tran Cao Son Computer languages . 2010,第2期

机译：在分布式内存平台上并行执行答案集程序的研究：任务共享和动态调度
3. Energy Efficient Task Scheduling of Send-Receive Task Graphs on Distributed Multi-Core Processors with Software Controlled Dynamic Voltage Scaling [J] . Abhishek Mishra, Anil Kumar Tripathi International Journal of Computer Science & Information Technology (IJCSIT) . 2011,第2期

机译：具有分布式软件控制的动态电压缩放功能的分布式多核处理器上的收发任务图的节能任务调度
4. Albatross: An efficient cloud-enabled task scheduling and execution framework using distributed message queues [C] . Iman Sadooghi, Geet Kumar, Ke Wang, IEEE International Conference on e-Science . 2016

机译：Albatross：使用分布式消息队列的有效的云的任务调度和执行框架
5. Co-scheduling real-time tasks and non real-time tasks using empirical probability distribution of execution time requirements. [D] . Singh, Abhishek. 2009

机译：使用执行时间要求的经验概率分布来共同调度实时任务和非实时任务。
6. A Novel Cost-Efficient Framework for Critical Heartbeat Task Scheduling Using the Internet of Medical Things in a Fog Cloud System [O] . Qurat-ul-ain Mastoi, Teh Ying Wah, Ram Gopal Raj, 2020

机译：在雾云系统中使用医疗物联网进行关键心跳任务调度的新型经济高效框架
7. Achieving Efficient Distributed Scheduling with Message Queues in the Cloud for Many-Task Computing and High-Performance Computing [O] . Iman Sadooghi, Eep Palur, Ajay Anthony, 2015

机译：利用云中的消息队列实现高效的分布式调度，实现多任务计算和高性能计算

Albatross: An efficient cloud-enabled task scheduling and execution framework using distributed message queues

摘要

著录项

相似文献

相关主题

期刊订阅