首页> 外文学位 >An Efficient Data Deduplication Design with Flash-Memory Based Solid State Drive.
【24h】

An Efficient Data Deduplication Design with Flash-Memory Based Solid State Drive.

机译:使用基于闪存的固态驱动器进行高效的重复数据删除设计。

获取原文
获取原文并翻译 | 示例

摘要

Today, a predominant portion of Internet services (e.g., content delivery networks, online backup storage, news broadcasting, blog sharing and social networks) are data centric. A significant amount of new data is generated by these services every day and a large portion of this created data is redundant. Data deduplication is a prevailing technique used to identify and eliminate redundant data, so as to reduce the space requirement for both primary file systems and data backups.;The variety of objectives in a deduplication system design is the primary interest of this dissertation. These objects include maximizing the redundant data removed and achieving a high deduplication read/write throughput with a minimum RAM overhead per chunk. To achieve the first objective, this dissertation proposes a novel chunking algorithm that breaks the input dataset into chunks, with a higher redundancy or with larger sizes, so as to identify the more duplicated data without producing larger numbers of chunks, as compared to other chunking algorithms. To achieve high deduplication throughput while minimizing RAM overhead per chunk, this dissertation proposes a RAM frugal chunk index design along with a chunk filter that is used to filter out index lookups on nonexistent chunks. Both index and filter designs efficiently use a very limited RAM space with ash-memory as persistent storage. In particular, the proposed chunk filter design can dynamically scale up to adapt to the growth of the dataset. In addition, the proposed chunk index design could achieve high throughput, low latency chunk lookup/insert operations with extremely low RAM overhead at the sub-byte-per-chunk level.
机译:如今,互联网服务的主要部分(例如,内容交付网络,在线备份存储,新闻广播,博客共享和社交网络)以数据为中心。这些服务每天都会生成大量新数据,并且这些创建的数据中有很大一部分是冗余的。重复数据删除技术是一种用于识别和消除冗余数据的流行技术,以减少主文件系统和数据备份的空间需求。重复数据删除系统设计中的各种目标是本文的主要目的。这些目标包括最大化移除的冗余数据,并以每块最小的RAM开销实现高的重复数据删除读取/写入吞吐量。为了实现第一个目标,本文提出了一种新颖的分块算法,与其他分块相比,该算法将输入数据集分成具有更高冗余度或更大大小的数据块,从而识别出重复数据更多的数据而不会产生更多的数据块算法。为了实现较高的重复数据删除吞吐量,同时最大程度地减少每个块的RAM开销,本文提出了一种RAM节俭的块索引设计以及一个块过滤器,该过滤器用于过滤不存在​​的块的索引查找。索引和过滤器设计都有效地使用了非常有限的RAM空间以及作为持久性存储的ash内存。特别地,所提出的组块滤波器设计可以动态地扩大以适应数据集的增长。另外,提出的块索引设计可以实现高吞吐量,低延迟的块查找/插入操作,并且每个块的子字节级别的RAM开销极低。

著录项

  • 作者

    Lu, Guanlin.;

  • 作者单位

    University of Minnesota.;

  • 授予单位 University of Minnesota.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 116 p.
  • 总页数 116
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:43:32

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号