SAUD: Semantics-Aware and Utility-Driven Deduplication Framework for Primary Storage

机译：SAUD：主存储的语义感知和实用程序驱动的重复数据删除框架

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data deduplication is an efficient technology to reduce storage cost for cloud storage systems, especially when massive volume of data has become normalcy in this era of Big Data. Primary storage, as the direct interaction layer with service users, has reaped the benefit of deduplication technologies due to its expensive manufacturing cost. However, since primary storage is constantly accessed by users, workloads of primary storage systems are mostly latency-sensitive. Such workload feature makes it challenging to develop both performance and space efficient deduplication schemes for primary storage systems. Existing deduplication schemes on primary storage pay little attention to achieving desirable space saving while restraining the inherent performance penalty to a little extent. In this paper, we propose SAUD, a Semantics-Aware and Utility-Driven deduplication framework to provide high space saving with minor performance penalty for primary storage. SAUD delivers performance-oriented deduplication service by leveraging the file-level semantics of primary storage in a quantitative way. SAUD calculates deduplication priority of files with diverse semantics as deduplicating instructions. Moreover, SAUD operates in a selective-on mode by dynamically regulating the deduplication process based on the real-time workload and system status, further reducing the side-effect on system performance. Comprehensive evaluations show that SAUD outperforms all other comparative schemes on system read performance by an average of 54.6%. SAUD manages to achieve 82.1% of the space efficiency achieved by the most space-orient scheme, read performance of which falls behind that of SAUD by as much as 80.1%.

机译：重复数据删除是一种有效的技术，可以降低云存储系统的存储成本，尤其是在大数据时代，海量数据已成为常态时。主存储作为与服务用户的直接交互层，由于其昂贵的制造成本而获得了重复数据删除技术的好处。但是，由于用户不断访问主存储，因此主存储系统的工作负载大多对延迟敏感。这种工作负载功能使开发针对主存储系统的性能和空间高效的重复数据删除方案面临挑战。主存储上的现有重复数据删除方案很少注意实现所需的空间节省，同时在一定程度上限制了固有的性能损失。在本文中，我们提出了SAUD，这是一种语义感知和实用程序驱动的重复数据删除框架，可以节省大量空间，并且对主存储的性能影响较小。 SAUD通过定量利用主存储的文件级语义来提供面向性能的重复数据删除服务。 SAUD计算具有重复语义的文件的重复数据删除优先级作为重复数据删除指令。此外，SAUD通过基于实时工作负载和系统状态动态调节重复数据删除过程以选择性开启模式运行，从而进一步降低了对系统性能的副作用。综合评估表明，SAUD在系统读取性能方面优于所有其他比较方案，平均为54.6％。 SAUD设法实现了最面向空间的方案所实现的空间效率的82.1％，其读取性能落后于SAUD高达80.1％。

著录项

来源
《2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, 2015 IEEE 12th International Conference on Embedded Software and Systems》|2015年|190-197|共8页
会议地点 New York NY(US)
作者
Yan Tang; Jianwei Yin; Wei Lo;
展开▼
作者单位

Coll. of Comput. Sci. Technol., Zhejiang Univ., Hangzhou, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Big Data; cloud computing; storage management; Big Data; SAUD; cloud storage systems; file-level semantics; primary storage; semantics-aware and utility-driven deduplication framework; Delays; Hardware; Indexes; Real-time systems; Semantics; Servers; System performance; Deduplication; SAUD; Semantics-aware; Utility-driven;

机译：大数据;云计算;存储管理;大数据; SAUD;云存储系统;文件级语义;主存储;语义感知和实用程序驱动的重复数据删除框架;延迟;硬件;索引;实时系统;语义;服务器;系统性能;重复数据删除; SAUD;语义感知;实用程序驱动;;
入库时间 2022-08-26 13:53:54

相似文献

外文文献
中文文献
专利

1. MUSE: A Multi-Tierd and SLA-Driven Deduplication Framework for Cloud Storage Systems [J] . Yin Jianwei, Tang Yan, Deng Shuiguang, IEEE Transactions on Computers . 2021,第5期

机译：MUSE：用于云存储系统的多层和SLA驱动的重复数据删除框架
2. A Simulation Analysis of Redundancy and Reliability in Primary Storage Deduplication [J] . Min Fu, Shujie Han, Patrick P. C. Lee, IEEE Transactions on Computers . 2018,第9期

机译：主存储重复数据删除中的冗余和可靠性的仿真分析
3. Leveraging Data Deduplication to Improve the Performance of Primary Storage Systems in the Cloud [J] . Bo Mao, Hong Jiang, Suzhen Wu, IEEE Transactions on Computers . 2016,第6期

机译：利用重复数据删除提高云中主存储系统的性能
4. SAUD: Semantics-Aware and Utility-Driven Deduplication Framework for Primary Storage [C] . Yan Tang, Jianwei Yin, Wei Lo IEEE International Conference on High Performance Computing and Communications . 2015

机译：Saud：主存储的语义感知和实用的重复数据删除框架
5. Statistical Characterization of Storage System Workloads for Data Deduplication and Load Placement in Heterogeneous Storage Environments. [D] . Park, Nohhyun. 2013

机译：异构存储环境中用于重复数据删除和负载放置的存储系统工作负载的统计特性。
6. Prevalence of primary dysmenorrhea and its effect on the quality of life amongst female medical students at King Saud University Riyadh Saudi Arabia: [O] . Refan T. Hashim, Sara S. Alkhalifah, Alanoud A. Alsalman, 2020

机译：原发性痛经的患病率及其对女医学生生活质量的影响沙特阿拉伯利雅得：
7. Distributed Exact Deduplication for Primary Storage Infrastructures [O] . Paulo, João, Pereira, José 2014

机译：主存储基础架构的分布式精确重复数据删除

SAUD: Semantics-Aware and Utility-Driven Deduplication Framework for Primary Storage

摘要

著录项

相似文献

相关主题

期刊订阅