...
首页> 外文期刊>LIPIcs : Leibniz International Proceedings in Informatics >Buffered Count-Min Sketch on SSD: Theory and Experiments
【24h】

Buffered Count-Min Sketch on SSD: Theory and Experiments

机译:SSD上的缓冲计数最小草图:理论和实验

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Frequency estimation data structures such as the count-min sketch (CMS) have found numerous applications in databases, networking, computational biology and other domains. Many applications that use the count-min sketch process massive and rapidly evolving data sets. For data-intensive applications that aim to keep the overestimate error low, the count-min sketch becomes too large to store in available RAM and may have to migrate to external storage (e.g., SSD.) Due to the random-read/write nature of hash operations of the count-min sketch, simply placing it on SSD stifles the performance of time-critical applications, requiring about 4-6 random reads/writes to SSD per estimate (lookup) and update (insert) operation.In this paper, we expand on the preliminary idea of the buffered count-min sketch (BCMS) {[Eydi et al., 2017]}, an SSD variant of the count-min sketch, that uses hash localization to scale efficiently out of RAM while keeping the total error bounded. We describe the design and implementation of the buffered count-min sketch, and empirically show that our implementation achieves 3.7 x-4.7 x speedup on update and 4.3 x speedup on estimate operations compared to the traditional count-min sketch on SSD.Our design also offers an asymptotic improvement in the external-memory model over the original data structure: r random I/Os are reduced to 1 I/O for the estimate operation. For a data structure that uses k blocks on SSD, w as the word/counter size, r as the number of rows, M as the number of bits in the main memory, our data structure uses kwr/M amortized I/Os for updates, or, if kwr/M 1, 1 I/O in the worst case. In typical scenarios, kwr/M is much smaller than 1. This is in contrast to O(r) I/Os incurred for each update in the original data structure.Lastly, we mathematically show that for the buffered count-min sketch, the error rate does not substantially degrade over the traditional count-min sketch. Specifically, we prove that for any query q, our data structure provides the guarantee: Pr[Error(q) = n epsilon (1+o(1))] = delta + o(1), which, up to o(1) terms, is the same guarantee as that of a traditional count-min sketch.
机译:频率估算数据结构(例如最小计数草图(CMS))已在数据库,网络,计算生物学和其他领域中找到了许多应用。许多使用最小计数草图的应用程序会处理大量且快速发展的数据集。对于旨在将高估错误保持在较低水平的数据密集型应用程序,最小计数草图变得太大而无法存储在可用的RAM中,并且可能必须迁移到外部存储设备(例如SSD)。由于随机读取/写入的特性最小计数草图的哈希操作的过程,将其简单地放在SSD上会扼杀时间紧迫的应用程序的性能,每个估计(查找)和更新(插入)操作需要对SSD进行约4-6次随机读取/写入。 ,我们扩展了缓冲计数最小草图(BCMS){[Eydi et al。,2017]}的初步思想,它是计数最小草图的SSD变体,它使用哈希本地化来有效地扩展RAM之外,同时保持总误差有界。我们描述了缓冲的最小计数草图的设计和实现,并通过经验证明与传统的SSD最小计数草图相比,我们的实现在更新时实现3.7 x-4.7 x加速,在估计操作中实现4.3 x加速。与原始数据结构相比,它在外部存储器模型中提供了渐近的改进:r随机I / O减少为1 I / O用于估计操作。对于在SSD上使用k个块的数据结构,w为字/计数器大小,r为行数,M为主内存中的位数,我们的数据结构使用kwr / M摊余I / O进行更新,或者,如果kwr / M> 1,在最坏的情况下为1 I / O。在典型情况下,kwr / M远小于1。这与原始数据结构中每次更新产生的O(r)I / O相反。最后,我们从数学上表明,对于缓冲的最小计数草图,误差率在传统的最小计数草图上基本上不会降低。具体来说,我们证明对于任何查询q,我们的数据结构都提供了保证:Pr [Error(q)> = n epsilon(1 + o(1))] <= delta + o(1),其中最大为o (1)术语与传统的计数分钟草图相同。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号