首页> 外文期刊>Computing reviews >DCDedupe: selective deduplication and delta compression with effective routing for distributed storage
【24h】

DCDedupe: selective deduplication and delta compression with effective routing for distributed storage

机译:DCDedupe:选择性重复数据删除和增量压缩以及有效的路由,可用于分布式存储

获取原文
获取原文并翻译 | 示例

摘要

In big data scenarios, data is often duplicated for system response time, efficiency, and network issues and then eliminated through deduplication. Data is compressed for storage efficiency and further abstraction or regression. The authors propose DCDedupe, a system for on-premise distributed storage that selectively compresses data and efficiently impacts deduplication using analytics, hardware acceleration, and design parameters (cost, efficiency, and effectiveness). DCDedupe is centered on (1) a (quick) decision mechanism technique for yielding acceptable accuracy and (2) an algorithm for selecting, marshaling, and routing (distributed) data chunks to ensure they are sent to the right nodes of the distributed data system. The paper is divided into sections: "Introduction," "Related Work," "Deduplication vs. Delta Compression," "Design," "Evaluation," and "Conclusions." Using conclusions from a case study, the design section describes DCDedupe design principles and considerations, selecting an architecture and system, chunk classification methods, routing algorithms, delta compression levels, and the overall (work and data) flow. The evaluation section includes the experimental setup, storage efficiency results, sampling methods, and memory usage for sampling records. In the last section, the authors conclude that DCDedupe improves the decision-making accuracy in pre-processing and reduces storage space requirements by 30 percent; however, there is some penalty on processing speed (between 15 to 22 percent). Further work on pre-processing methods, fault tolerance enhancements, and server overload is required.
机译:在大数据场景中,通常会针对系统响应时间,效率和网络问题复制数据,然后通过重复数据删除消除数据。数据经过压缩以提高存储效率以及进一步的抽象或回归。作者提出了DCDedupe,这是一种用于本地分布式存储的系统,该系统可以使用分析,硬件加速和设计参数(成本,效率和效率)选择性地压缩数据并有效影响重复数据删除。 DCDedupe集中于(1)产生可接受的准确性的(快速)决策机制技术和(2)用于选择,封送和路由(分布式)数据块以确保将其发送到分布式数据系统的正确节点的算法。本文分为以下几部分:“简介”,“相关工作”,“重复数据删除与增量压缩”,“设计”,“评估”和“结论”。使用根据案例研究得出的结论,设计部分描述了DCDedupe的设计原理和注意事项,选择体系结构和系统,组块分类方法,路由算法,增量压缩级别以及总体(工作和数据)流程。评估部分包括实验设置,存储效率结果,采样方法以及采样记录的内存使用情况。在最后一部分中,作者得出结论,DCDedupe提高了预处理中的决策准确性,并将存储空间需求降低了30%。但是,处理速度会受到一些影响(介于15%到22%之间)。需要对预处理方法,容错能力增强和服务器过载进行进一步的工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号