首页> 外文学位 >Hot and Cold Data Identification: Applications to Storage Devices and Systems.
【24h】

Hot and Cold Data Identification: Applications to Storage Devices and Systems.

机译:冷热数据识别:在存储设备和系统中的应用。

获取原文
获取原文并翻译 | 示例

摘要

Hot data identification is an issue of paramount importance in storage systems since it has a great impact on their overall performance as well as retains a big potential to be applicable to many other fields. However, it has been least investigated. In this dissertation, I propose two novel hot data identification schemes: (1) multiple bloom filter-based scheme and (2) sampling-based scheme. Then I apply them to the storage device and system such as Solid State Drives (SSD) and data deduplication system.;In the multiple bloom filter-based hot data identification scheme, I adopt multiple bloom filters and hash functions to efficiently capture finer-grained recency as well as frequency information by assigning a different recency coverage to each bloom filter. The sampling-based scheme employs a sampling mechanism so that it early discards some of the cold items to reduce runtime overheads and a waste of memory spaces. Both hot data identification schemes empower each scheme to precisely and efficiently identify hot data in storage with less system resources.;Based on these approaches, I choose two storage fields as their applications: NAND flash-based SSD design and data deduplication system. Particularly in SSD design, hot data identification has a critical impact on its performance (due to a garbage collection) as well as its life span (due to a wear leveling). To address these issues in SSD design, I propose a new hybrid Flash Translation Layer (FTL) design that is a core part of the SSD design. The proposed FTL (named CFTL) is adaptive to data access patterns with the help of the multiple bloom filter-based hot data identification algorithm.;As the other application, I explore a data deduplication storage system. Data deduplication (for short, dedupe) is a special data compression technique that has been widely adopted especially in backup storage systems for backup time saving as well as storage saving. Unlike the traditional dedupe research that has focused more on the write performance improvement, I address its read performance aspect. In this section, I newly design a read cache in dedupe storage for a backup application to improve read performance by looking ahead their future references in a moving window with the combination of a hot data identification algorithm.;This dissertation addresses the importance of hot data identification in storage areas and shows how it can be effectively applied to them in order to overcome the existing limitations in each storage venue.
机译:在存储系统中,热数据识别是一个至关重要的问题,因为它对它们的整体性能有很大的影响,并且保留了可应用于许多其他领域的巨大潜力。但是,对此进行了最少的调查。本文提出了两种新颖的热数据识别方案:(1)基于多重布隆过滤器的方案和(2)基于采样的方案。然后将它们应用于存储设备和系统,例如固态驱动器(SSD)和重复数据删除系统。;在基于多布隆过滤器的热数据识别方案中,我采用了多个布隆过滤器和哈希函数来有效地捕获细粒度通过为每个布隆过滤器分配不同的新近度覆盖率来获得新近度和频率信息。基于采样的方案采用了一种采样机制,因此它可以尽早丢弃一些冷项,以减少运行时的开销和内存空间的浪费。两种热数据识别方案都可以使每种方案以较少的系统资源精确有效地识别存储中的热数据。基于这些方法,我选择了两个存储字段作为其应用程序:基于NAND闪存的SSD设计和重复数据删除系统。特别是在SSD设计中,热数据识别对其性能(由于收集垃圾)及其寿命(由于损耗平衡)具有至关重要的影响。为了解决SSD设计中的这些问题,我提出了一种新的混合闪存转换层(FTL)设计,它是SSD设计的核心部分。借助基于多重布隆过滤器的热数据识别算法,所提出的FTL(称为CFTL)适用于数据访问模式。作为另一个应用,我探索了一种重复数据删除存储系统。重复数据删除(简称重复数据删除)是一种特殊的数据压缩技术,已特别在备份存储系统中广泛采用,以节省备份时间并节省存储空间。与传统的重复数据删除研究更多地侧重于写入性能的改进不同,我将介绍其读取性能方面。在本节中,我将为重复数据删除存储中的备份应用程序设计一个新的读取缓存,以通过结合使用热数据识别算法在移动窗口中预见将来的引用来提高读取性能。标识存储区域,并显示如何有效地将其应用于存储区域,以克服每个存储场所中的现有限制。

著录项

  • 作者

    Park, Dongchul.;

  • 作者单位

    University of Minnesota.;

  • 授予单位 University of Minnesota.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 143 p.
  • 总页数 143
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号