Avoiding the Disk Bottleneck in the Data Domain Deduplication File System

机译：避免Data Domain重复数据删除文件系统中的磁盘瓶颈

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Disk-based deduplication storage has emerged as the new-generation storage system for enterprise data protection to replace tape libraries. Deduplication removes redundant data segments to compress data into a highly compact form and makes it economical to store backups on disk instead of tape. A crucial requirement for enterprise data protection is high throughput, typically over 100 MB/sec, which enables backups to complete quickly. A significant challenge is to identify and eliminate duplicate data segments at this rate on a low-cost system that cannot afford enough RAM to store an index of the stored segments and may be forced to access an on-disk index for every input segment. This paper describes three techniques employed in the production Data Domain deduplication file system to relieve the disk bottleneck. These techniques include: (1) the Summary Vector, a compact in-memory data structure for identifying new segments; (2) Stream-Informed Segment Layout, a data layout method to improve on-disk locality for sequentially accessed segments; and (3) Locality Preserved Caching, which maintains the locality of the fingerprints of duplicate segments to achieve high cache hit ratios. Together, they can remove 99% of the disk accesses for deduplication of real world workloads. These techniques enable a modern two-socket dual-core system to run at 90% CPU utilization with only one shelf of 15 disks and achieve 100 MB/sec for single-stream throughput and 210 MB/sec for multi-stream throughput.

机译：基于磁盘的重复数据删除存储已成为用于企业数据保护以替代磁带库的新一代存储系统。重复数据删除可以删除多余的数据段，从而将数据压缩为高度紧凑的格式，从而可以经济地将备份存储在磁盘而不是磁带上。企业数据保护的一项关键要求是高吞吐量（通常超过100 MB /秒），这使备份能够快速完成。一个巨大的挑战是在低成本系统上以这种速率识别和消除重复的数据段，该系统无法提供足够的RAM来存储所存储段的索引，并且可能被迫访问每个输入段的磁盘索引。本文介绍了在生产Data Domain重复数据删除文件系统中使用的三种技术来缓解磁盘瓶颈。这些技术包括：（1）摘要向量，一种紧凑的内存数据结构，用于识别新段；（2）流信息段布局，一种数据布局方法，用于改善顺序访问的段在磁盘上的局部性；（3）局部性保留缓存，保持重复段指纹的局部性，以实现较高的缓存命中率。他们可以共同删除99％的磁盘访问权限，以消除实际工作负载中的重复数据。这些技术使现代的两路双路双核系统能够以90％的CPU使用率运行，而只有一个15盘的磁盘架，单流吞吐量达到100 MB /秒，多流吞吐量达到210 MB /秒。

著录项

来源
《Proceedints of the 6th USENIX Conference on File and Storage Technologies(FAST'08)》|2008年|P.269-282|共14页
会议地点 San Francisco CA(US);San Francisco CA(US)
作者
Benjamin Zhu; Kai Li; Hugo Patterson;
展开▼
作者单位

USENIX Association;

ACM Sigops, IEEE Mass Storage Systems Technical Committee(MSSTC), and IEEE TCOS;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词

相似文献

外文文献
中文文献
专利

1. TDDFS A Tier-Aware Data Deduplication-Based File System [J] . Cao Zhichao, Wen Hao, Ge Xiongzi, ACM Transactions on Storage . 2019,第1期

机译：TDDFS一个基于重复数据删除的文件系统
2. PFP: Improving the Reliability of Deduplication-based Storage Systems with Per-File Parity [J] . Wu Suzhen, Mao Bo, Jiang Hong, IEEE Transactions on Parallel and Distributed Systems . 2019,第9期

机译：PFP：通过每个文件奇偶校验提高基于重复数据删除的存储系统的可靠性
3. The Optical File Cabinet: a random-access file system for write-once optical disks [J] . Gait J. Computer . 1988,第6期

机译：光学文件柜：用于一次写入光盘的随机访问文件系统
4. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System [C] . Benjamin Zhu, Kai Li, Hugo Patterson USENIX Conference on File and Storage Technologies . 2008

机译：避免数据域重复数据删除文件系统中的磁盘瓶颈
5. The Conquest file system: A disk/persistent-RAM hybrid design for better performance and simpler data paths. [D] . Wang, An-I Andy. 2003

机译：Conquest文件系统：一种磁盘/永久性RAM混合设计，可提供更好的性能和更简单的数据路径。
6. XIII. Hospital Information Systems: F. Strategic Planning for Hospital Information Systems: Microcomputer Based VA File Manager System—The Safety Valve for Planning and Development Bottlenecks [O] . Clifford Harwood, Alice G. Covell 1981

机译：十三。医院信息系统：F.医院信息系统的战略规划：基于微型计算机的VA File Manager系统-规划和开发瓶颈的安全阀
7. Using file system virtualization to avoid metadata bottlenecks [O] . Artiaga Amouroux, Ernest, Cortés, Toni 2010

机译：使用文件系统虚拟化来避免元数据瓶颈
8. A Disk File Library Management System for the CDC-3800 813 Disk File [R] . Roberts, J. D. 1971

机译：CDC-3800 813磁盘文件的磁盘文件库管理系统

Avoiding the Disk Bottleneck in the Data Domain Deduplication File System

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅