...
首页> 外文期刊>Journal of computer and system sciences >Content-dependent chunking for differential compression, the local maximum approach
【24h】

Content-dependent chunking for differential compression, the local maximum approach

机译:基于内容的分块,用于差分压缩,局部最大值方法

获取原文
获取原文并翻译 | 示例
           

摘要

When a file is to be transmitted from a sender to a recipient and when the latter already has a file somewhat similar to it, remote differential compression seeks to determine the similarities interactively so as to transmit only the part of the new file not already in the recipient's old file. Content-dependent chunking means that the sender and recipient chop their files into chunks, with the cutpoints determined by some internal features of the files, so that when segments of the two files agree (possibly in different locations within the files) the cutpoints in such segments tend to be in corresponding locations, and so the chunks agree. By exchanging hash values of the chunks, the sender and recipient can determine which chunks of the new file are absent from the old one and thus need to be transmitted.rnWe propose two new algorithms for content-dependent chunking, and we compare their behavior, on random files, with each other and with previously used algorithms. One of our algorithms, the local maximum chunking method, has been implemented and found to work better in practice than previously used algorithms.rnTheoretical comparisons between the various algorithms can be based on several criteria, most of which seek to formalize the idea that chunks should be neither too small (so that hashing and sending hash values become inefficient) nor too large (so that agreements of entire chunks become unlikely). We propose a new criterion, called the slack of a chunking method, which seeks to measure how much of an interval of agreement between two files is wasted because it lies in chunks that don't agree. Finally, we show how to efficiently find the cutpoints for local maximum chunking.
机译:当要将文件从发送方传输到接收方,并且当接收方已经具有与接收方有点相似的文件时,远程差分压缩会尝试以交互方式确定相似性,以便仅传输新文件中尚未包含的部分。收件人的旧文件。依赖于内容的分块意味着发送方和接收方将其文件切成块,其切点由文件的某些内部功能确定,因此,当两个文件的段一致时(可能在文件内的不同位置),切点在这种情况下片段往往位于相应的位置,因此大块一致。通过交换数据块的哈希值,发送方和接收方可以确定旧文件中不存在新文件中的哪些数据块,因此需要传输。我们针对内容依赖的数据块提出了两种新算法,并比较了它们的行为,相互之间以及以前使用的算法对随机文件进行处理。我们已经实现了一种算法,即局部最大分块方法,并且发现它在实践中比以前使用的算法更好.rn各种算法之间的理论比较可以基于几个标准,其中大多数旨在将分块应该正规化的想法既不能太小(以至于散列和发送散列值变得效率低下),也不能太大(以至于整个块的协议变得不太可能)。我们提出了一个称为“分块方法的松弛”的新标准,该标准旨在衡量两个文件之间的协议间隔被浪费了多少,因为它位于不同意的块中。最后,我们展示了如何有效地找到局部最大分块的切点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号