首页> 外文会议>IEEE International Conference on Parallel and Distributed Systems >Utilizing SSD to Alleviate Chunk Fragmentation in De-Duplicated Backup Systems
【24h】

Utilizing SSD to Alleviate Chunk Fragmentation in De-Duplicated Backup Systems

机译:利用SSD减少重复数据删除备份系统中的块碎片

获取原文

摘要

Data deduplication, which removes redundant data so that only one copy of duplicate blocks needs to be actually stored, has been implemented in almost all storage appliances, including archival and back-up systems, primary data storage, and SSD devices, to save storage space. However, as time goes and more duplicate blocks have been ingested into the system, the fragmentation problem emerges, that is, logically continuous data blocks of later stored datasets are dispersed in a large storage space and as a result restoring them requires a lot of extra disk accesses, significantly degrading restore performance and garbage collection efficiency. Existing approaches toward the fragmentation problem choose to sacrifice space savings for performance by selectively rewriting trouble-causing duplicate blocks when performing deduplication, even though they have already been stored elsewhere previously. However, rewriting chunks into the system impacts the backup process and reduces deduplication efficiency as many duplicate chunks are allowed in the system. In this work, we propose to deploy flash-based SSDs in the system to overcome the limitations of rewriting algorithms by taking advantage of the high performance provided by SSDs. Specifically, instead of rewriting, we migrate the trouble-causing blocks into an SSD storage in the background when encountering duplicate blocks. The idea is mainly motivated by the following two reasons. First, using a separate migrating process leverages the computing power provided by modern multi-core architecture. Second, typically restores are not performed immediately after backups. Therefore, there is no need to rewrite blocks on the critical path, which affects performance. We augment our proposal to two rewriting schemes and conduct comprehensive evaluations to evaluate its efficacy. Our results show that by provisioning a reasonable amount of SSD, the backup performance and deduplication efficiency can be significantly improved, while slightly increasing the amount of container reads associated with restore operations.
机译:几乎所有存储设备(包括档案和备份系统,主数据存储和SSD设备)都已实施了重复数据删除技术,该数据删除技术可删除冗余数据,从而仅需要实际存储重复块的一个副本。 。但是,随着时间的流逝,更多的重复块已被吸收到系统中,出现了碎片问题,即后来存储的数据集的逻辑连续数据块分散在较大的存储空间中,因此,恢复它们需要大量额外的工作。磁盘访问,大大降低了还原性能和垃圾回收效率。解决碎片问题的现有方法选择通过在执行重复数据删除时选择性地重写引起麻烦的重复块来牺牲性能的空间节省,即使它们先前已存储在其他位置也是如此。但是,将块重写到系统中会影响备份过程并降低重复数据删除效率,因为系统中允许有许多重复块。在这项工作中,我们建议在系统中部署基于闪存的SSD,以通过利用SSD提供的高性能来克服重写算法的局限性。具体来说,当遇到重复的块时,我们将替代将导致故障的块迁移到后台的SSD存储中,而不是进行重写。这个想法主要是出于以下两个原因。首先,使用单独的迁移过程可以利用现代多核体系结构提供的计算能力。其次,通常不会在备份后立即执行还原。因此,无需在关键路径上重写块,这会影响性能。我们将提案扩展到两个重写方案,并进行综合评估以评估其有效性。我们的结果表明,通过提供合理数量的SSD,可以显着提高备份性能和重复数据删除效率,同时略微增加与还原操作相关的容器读取数量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号