首页> 外文会议>IEEE International Conference on Data Engineering Workshops >Towards Batch-Processing on Cold Storage Devices
【24h】

Towards Batch-Processing on Cold Storage Devices

机译:在冷库设备上批量处理

获取原文

摘要

Large amounts of data in storage systems is cold, i.e., Written Once and Read Occasionally (WORO). The rapid growth of massive-scale archival and historical data increases the demand for petabyte-scale cheap storage for such cold data. A Cold Storage Device (CSD) is a disk-based storage system which is designed to trade off performance for cost and power efficiency. Inevitably, the design restrictions used in CSD's results in performance limitations. These limitations are not a concern for WORO workloads, however, the very low price/performance characteristics of CSDs makes them interesting for other applications, e.g., batch processes, too. Applications, however, can be very slow on CSD's if they do not take their characteristics into account. In this paper we design two strategies for data partitioning in CSDs -- a crucial operation in many batch analytics tasks like hash-join, near-duplicate detection, and data localization. We show that our strategies can efficiently use CSDs for batch processing of terabyte-scale data by accelerating data partitioning by 3.5x in our experiments.
机译:存储系统中的大量数据是冷的,即,写一次并偶尔读取(Woro)。大规模档案档案和历史数据的快速增长会增加对这种冷数据的Petabyte规模廉价存储的需求。冷库设备(CSD)是基于磁盘的存储系统,该存储系统旨在为成本和功率效率进行衡量性能。不可避免地,CSD中使用的设计限制导致性能限制。这些限制不是对Woro工作负载的关注,但是,CSD的非常低的价格/性能特征使得它们对于其他应用程序也有趣,例如批处理过程。但是,如果他们没有考虑到他们的特征,则可以对CSD进行非常慢的应用程序。在本文中,我们设计了两个用于CSD中的数据分区的策略 - 许多批次分析任务中的重要操作,如Hash-Join,近重复检测和数据定位。我们表明我们的策略可以通过在我们的实验中加速数据分区3.5倍来有效地使用CSD进行批量处理Tberbyte-Scale数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号