Scheduling Shared Scans of Large Data Files

机译：安排大数据文件的共享扫描

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We study how best to schedule scans of large data files, in the presence of many simultaneous requests to a common set of files. The objective is to maximize the overall rate of processing these files, by sharing scans of the same file as aggressively as possible, without imposing undue wait time on individual jobs. This scheduling problem arises in batch data processing environments such as Map-Reduce systems, some of which handle tens of thousands of processing requests daily, over a shared set of files.As we demonstrate, conventional scheduling techniques such as shortest-job-first do not perform well in the presence of cross-job sharing opportunities. We derive a new family of scheduling policies specifically targeted to sharable workloads. Our scheduling policies revolve around the notion that, all else being equal, it is good to schedule nonsharable scans ahead of ones that can share IO work with future jobs, if the arrival rate of sharable future jobs is expected to be high. We evaluate our policies via simulation over varied synthetic and real workloads, and demonstrate significant performance gains compared with conventional scheduling approaches.

机译：我们研究了在对一组通用文件同时发出许多请求的情况下，如何最好地安排对大型数据文件的扫描。目的是通过尽可能积极地共享同一文件的扫描，以最大程度地提高处理这些文件的总体速度，而不会在单个作业上增加不必要的等待时间。这种调度问题出现在诸如Map-Reduce系统之类的批处理数据处理环境中，其中某些环境每天通过一组共享文件处理数以万计的处理请求。正如我们所演示的，在存在跨工作共享机会的情况下，诸如最短工作优先的常规调度技术效果不佳。我们得出了专门针对可共享工作负载的新的调度策略系列。我们的调度策略围绕这样一个概念，即在所有其他条件相同的情况下，如果可以共享的未来工作的到达率很高，最好在可以与未来工作共享IO工作的扫描之前计划不可共享的扫描。我们通过对各种综合和实际工作负载进行仿真来评估我们的策略，并证明与常规调度方法相比，性能得到了显着提高。

著录项

来源
《International conference on very large data bases;VLDB 2008》|2008年|957-968|共12页
会议地点
作者
Parag Agrawal; Daniel Kifer; Christopher Olston;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. On the optimization of schedules for MapReduce workloads in the presence of shared scans [J] . Joel Wolf, Andrey Balmin, Deepak Rajan, The VLDB journal . 2012,第5期

机译：在存在共享扫描的情况下优化MapReduce工作负载的计划
2. CIRCUMFLEX: A Scheduling Optimizer for MapReduce Workloads With Shared Scans [J] . Joel Wolf, Kirsten Hildrum, Andrey Balmin, Operating systems review . 2012,第1期

机译：CIRCUMFLEX：用于具有共享扫描的MapReduce工作负载的调度优化器
3. Shared data-aware dynamic resource provisioning and task scheduling for data intensive applications on hybrid clouds using Aneka [J] . Shreshth Tuli, Rajinder Sandhu, Rajkumar Buyya Future generation computer systems . 2020,第May期

机译：使用Aneka在混合云上为数据密集型应用程序共享数据感知的动态资源供应和任务调度
4. Scheduling Shared Scans of Large Data Files [C] . Parag Agrawal, Daniel Kifer, Christopher Olston International conference on very large data bases . 2008

机译：调度大数据文件的共享扫描
5. A Torrent of Copyright Infringement? Liability for BitTorrent File-Sharers and File-Sharing Facilitators Under Current and Proposed Canadian Copyright Law [D] . Mendelsohn, Allen 2011

机译：版权侵权激流？根据现行和拟议的加拿大版权法，BitTorrent文件共享者和文件共享促进者的责任
6. A joint design for functional data with application to scheduling ultrasound scans [O] . So Young Park, Luo Xiao, Jayson D. Willbur, -1

机译：功能数据的联合设计及其在计划超声扫描中的应用
7. Scheduling Shared Scans of Large Data Files [O] . Parag Agrawal, Daniel Kifer, Christopher Olston 2009

机译：安排大数据文件的共享扫描
8. Development of Data File Standards for Automated Ultrasonic Scanning Systems. Exchange Methods for Digital Ultrasonic Inspection Data. [R] . Berger, H., Jones, T. S., Gaynor, E. S., 1991

机译：自动超声扫描系统数据文件标准的开发。数字超声检测数据的交换方法。

Scheduling Shared Scans of Large Data Files

摘要

著录项

相似文献

相关主题

期刊订阅