【24h】

Towards Batch-Processing on Cold Storage Devices

机译:走向冷藏设备上的批处理

获取原文
获取原文并翻译 | 示例

摘要

Large amounts of data in storage systems is cold, i.e., Written Once and Read Occasionally (WORO). The rapid growth of massive-scale archival and historical data increases the demand for petabyte-scale cheap storage for such cold data. A Cold Storage Device (CSD) is a disk-based storage system which is designed to trade off performance for cost and power efficiency. Inevitably, the design restrictions used in CSD's results in performance limitations. These limitations are not a concern for WORO workloads, however, the very low price/performance characteristics of CSDs makes them interesting for other applications, e.g., batch processes, too. Applications, however, can be very slow on CSD's if they do not take their characteristics into account. In this paper we design two strategies for data partitioning in CSDs -- a crucial operation in many batch analytics tasks like hash-join, near-duplicate detection, and data localization. We show that our strategies can efficiently use CSDs for batch processing of terabyte-scale data by accelerating data partitioning by 3.5x in our experiments.
机译:存储系统中的大量数据是冷的,即一次写入并偶尔读取(WORO)。大规模档案和历史数据的快速增长增加了对此类冷数据的PB级廉价存储的需求。冷存储设备(CSD)是基于磁盘的存储系统,其设计目的是在性能和​​成本,功率效率之间进行权衡。不可避免地,CSD结果中使用的设计限制会导致性能限制。这些限制对于WORO工作负载而言并不重要,但是CSD的极低的价格/性能特性也使其成为其他应用程序(例如批处理)的关注点。但是,如果不考虑其特性,CSD上的应用程序可能会非常慢。在本文中,我们设计了两种在CSD中进行数据分区的策略-在许多批处理分析任务中的关键操作,例如哈希联接,近重复检测和数据本地化。我们表明,通过在实验中将数据分区加速3.5倍,我们的策略可以有效地将CSD用于TB级数据的批处理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号