Multi-job Hadoop scheduling to process Geo-distributed big data

机译：多作业Hadoop调度处理地理分布式大数据

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Effective big data analysis is one of the most notable research challenge of the latest few years. Hadoop, the most popular implementation of the MapReduce framework, has today become widespread used for processing large data sets using cloud resources. However, in many scenarios, data are geographically distributed over data centers and moving them to a single site for processing may result extremely expensive when not feasible at all. A key challenge for running applications in such a geographically distributed environment is how to efficiently schedule the computation over the different datacenters. In this work we present a job scheduler for a Hierarchical Hadoop Framework (H2F) that allows the management of multiple requests of job execution ensuring an efficient use of the available resources. Our experimental evaluations show that using H2F significantly improves processing time for geodistributed data sets with respect to a plain Hadoop system.

机译：有效的大数据分析是最近几年的最值得注意的研究挑战之一。 Hadoop是MapReduce框架最受欢迎的实现，今天已成为使用云资源处理大数据集的广泛。然而，在许多情况下，数据在地理上分布在数据中心上，并将它们移动到一个站点以进行处理，可能导致在不可行的情况下非常昂贵。在这种地理上分布的环境中运行应用程序的关键挑战是如何有效地将计算计划在不同的数据中心上。在这项工作中，我们为分层Hadoop框架（H2F）提供了一个作业调度程序，其允许管理多个作业请求，确保有效地使用可用资源。我们的实验评估表明，使用H2F显着提高了与普通Hadoop系统相对于普通Hadoop系统的地理分布式数据集的处理时间。

著录项

来源
《IEEE Symposium on Computers and Communications》|2017年|689-1385p|共7页
会议地点
作者
Marco Cavallo; Giuseppe Di Modica; Carmelo Polito; Orazio Tomarchio;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词
Big Data; MapReduce; Multi-job scheduling; Geographical computing environment; Hierarchical Hadoop;

机译：大数据;mapreduce;多职位调度;地理计算环境;等级Hadoop;

相似文献

外文文献
中文文献
专利

1. Job schedulers for Big data processing in Hadoop environment: testing real-life schedulers using benchmark programs [J] . Mohd Usama, Mengchen Liu, Min Chen Current Forestry Reports . 2017,第4期

机译：Hadoop环境中大数据处理的Job Scheduler：使用基准程序测试现实生活调度程序
2. Big Data Processing Workflows Oriented Real-Time Scheduling Algorithm using Task-Duplication in Geo-Distributed Clouds [J] . Chen Huangke, Wen Jinming, Pedrycz Witold, Big Data, IEEE Transactions on . 2020,第1期

机译：大数据处理在地理分布云中使用任务复制的实时调度算法面向实时调度算法
3. A Hierarchical Hadoop Framework to Handle Big Data in Geo-Distributed Computing Environments [J] . Orazio Tomarchio, Giuseppe Di Modica, Marco Cavallo, International journal of information technologies and systems approach . 2018,第1期

机译：在地理分布式计算环境中处理大数据的分层Hadoop框架
4. Multi-job Hadoop scheduling to process Geo-distributed big data [C] . Marco Cavallo, Giuseppe Di Modica, Carmelo Polito, IEEE Symposium on Computers and Communications . 2017

机译：多作业Hadoop调度处理地理分布式大数据
5. Geo-distributed big data processing. [D] . Jayalath, Chamikara Madhusanka. 2014

机译：地理分布式大数据处理。
6. Cloudwave: Distributed Processing of Big Data from Electrophysiological Recordings for Epilepsy Clinical Research Using Hadoop [O] . Catherine P. Jayapandian, Chien-Hung Chen, Alireza Bozorgi, 2013

机译：Cloudwave：使用Hadoop进行癫痫临床研究的电生理记录中的大数据分布式处理
7. Multi-job cyclic scheduling for processes varying in routing. Job scheduling for the case where the process network has no closed circuit. [O] . Kenji YURA, Katsundo HITOMI 1985

机译：用于路由中不同的过程的多功能循环调度。工艺网络没有闭合电路的情况的作业调度。

Multi-job Hadoop scheduling to process Geo-distributed big data

摘要

著录项

相似文献

相关主题

期刊订阅