Content-dependent chunking for differential compression, the local maximum approach

Nikolaj Bjorner; Andreas Blass; Yuri Gurevich

首页> 外文期刊>Journal of computer and system sciences >Content-dependent chunking for differential compression, the local maximum approach

【24h】

Content-dependent chunking for differential compression, the local maximum approach

机译：基于内容的分块，用于差分压缩，局部最大值方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

When a file is to be transmitted from a sender to a recipient and when the latter already has a file somewhat similar to it, remote differential compression seeks to determine the similarities interactively so as to transmit only the part of the new file not already in the recipient's old file. Content-dependent chunking means that the sender and recipient chop their files into chunks, with the cutpoints determined by some internal features of the files, so that when segments of the two files agree (possibly in different locations within the files) the cutpoints in such segments tend to be in corresponding locations, and so the chunks agree. By exchanging hash values of the chunks, the sender and recipient can determine which chunks of the new file are absent from the old one and thus need to be transmitted.rnWe propose two new algorithms for content-dependent chunking, and we compare their behavior, on random files, with each other and with previously used algorithms. One of our algorithms, the local maximum chunking method, has been implemented and found to work better in practice than previously used algorithms.rnTheoretical comparisons between the various algorithms can be based on several criteria, most of which seek to formalize the idea that chunks should be neither too small (so that hashing and sending hash values become inefficient) nor too large (so that agreements of entire chunks become unlikely). We propose a new criterion, called the slack of a chunking method, which seeks to measure how much of an interval of agreement between two files is wasted because it lies in chunks that don't agree. Finally, we show how to efficiently find the cutpoints for local maximum chunking.

机译：当要将文件从发送方传输到接收方，并且当接收方已经具有与接收方有点相似的文件时，远程差分压缩会尝试以交互方式确定相似性，以便仅传输新文件中尚未包含的部分。收件人的旧文件。依赖于内容的分块意味着发送方和接收方将其文件切成块，其切点由文件的某些内部功能确定，因此，当两个文件的段一致时（可能在文件内的不同位置），切点在这种情况下片段往往位于相应的位置，因此大块一致。通过交换数据块的哈希值，发送方和接收方可以确定旧文件中不存在新文件中的哪些数据块，因此需要传输。我们针对内容依赖的数据块提出了两种新算法，并比较了它们的行为，相互之间以及以前使用的算法对随机文件进行处理。我们已经实现了一种算法，即局部最大分块方法，并且发现它在实践中比以前使用的算法更好.rn各种算法之间的理论比较可以基于几个标准，其中大多数旨在将分块应该正规化的想法既不能太小（以至于散列和发送散列值变得效率低下），也不能太大（以至于整个块的协议变得不太可能）。我们提出了一个称为“分块方法的松弛”的新标准，该标准旨在衡量两个文件之间的协议间隔被浪费了多少，因为它位于不同意的块中。最后，我们展示了如何有效地找到局部最大分块的切点。

著录项

来源
《Journal of computer and system sciences》 |2010年第4期|p.154-203|共50页
作者
Nikolaj Bjorner; Andreas Blass; Yuri Gurevich;
展开▼
作者单位

Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA;

rnMathematics Department, University of Michigan, Ann Arbor, MI 48109-1043, USA;

rnMicrosoft Research, One Microsoft Way, Redmond, WA 98052, USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
networking; distributed file systems; compression; ergodic theory;

机译：联网;分布式文件系统;压缩;遍历理论;

相似文献

外文文献
中文文献
专利

1. A scalable data chunk similarity based compression approach for efficient big sensing data processing on cloud [J] . P. Jouvelot Computing reviews . 2017,第10期

机译：基于可伸缩数据块相似度的压缩方法，可在云上高效地进行大传感数据处理
2. A Scalable Data Chunk Similarity Based Compression Approach for Efficient Big Sensing Data Processing on Cloud [J] . Chi Yang, Jinjun Chen IEEE Transactions on Knowledge and Data Engineering . 2017,第6期

机译：基于可扩展数据块相似度的压缩方法，用于云上高效的大传感数据处理
3. Differential volumetric analysis combined with monitoring of differential ultra-sound travel time, an approach for tracing fine-structural compression behaviour of solids: a study on MgO / periclase and polyethylene / PE up to 1.5 GPa at room temperature [J] . Physics and chemistry of minerals . 2020,第6期

机译：差分体积分析与差分超声行程时间的监测相结合，追踪固体细结构压缩行为的方法：MgO / Periclase和聚乙烯/ PE在室温下的1.5GPa的研究
4. Sound localization and tracking using distributed microphones fusion: Maximum Likelihood or Maximum A-Posteriori approach? [C] . Elahi Ehtsham 2nd International Conference on Computer, Control and Communication (IC4 2009) . 2009

机译：使用分布式麦克风融合进行声音定位和跟踪：最大似然法还是最大A后验方法？
5. A Maximum-Likelihood Approach for Localizing and Characterizing Special Nuclear Material with a Dual-Particle Imager. [D] . Polack, John Kyle. 2016

机译：使用双粒子成像仪定位和表征特殊核材料的最大似然方法。
6. Bayesian Maximum-A-Posteriori Approach with Global and Local Regularization to Image Reconstruction Problem in Medical Emission Tomography [O] . Natalya Denisova 2019

机译：Bayesian最大A-Bouthiori方法具有全局和本地正则化对医疗区分断层扫描中的图像重建问题
7. Content-dependent chunking for differential compression, the local maximum approach [O] . Bjørner Nikolaj, Blass Andreas, Gurevich Yuri 2010

机译：基于内容的分块，用于差异压缩，局部最大方法
8. EFFECT OF VARIATION EN RIVET STRENGTHnON THE AVERAGE STRESS AT MAXIMUM LOAD FOR ALUMINUM-ALLOY, FLAT, Z-STIFFENED COMPRESSION PANELS HAT FAIL BY LOCAL BUCKLING [R] . Norris F. Dow, William A. Hickman, B. Walter Rosen 1953

机译：铝合金，平板，Z型加筋压缩板最大载荷下平均应力的变化对局部屈曲失效的影响

Content-dependent chunking for differential compression, the local maximum approach

摘要

著录项

相似文献

相关主题

期刊订阅