Buffered Count-Min Sketch on SSD: Theory and Experiments

Mayank Goswami; Dzejla Medjedovic; Emina Mekic; Prashant Pandey

首页> 外文期刊>LIPIcs : Leibniz International Proceedings in Informatics >Buffered Count-Min Sketch on SSD: Theory and Experiments

【24h】

Buffered Count-Min Sketch on SSD: Theory and Experiments

机译：SSD上的缓冲计数最小草图：理论和实验

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Frequency estimation data structures such as the count-min sketch (CMS) have found numerous applications in databases, networking, computational biology and other domains. Many applications that use the count-min sketch process massive and rapidly evolving data sets. For data-intensive applications that aim to keep the overestimate error low, the count-min sketch becomes too large to store in available RAM and may have to migrate to external storage (e.g., SSD.) Due to the random-read/write nature of hash operations of the count-min sketch, simply placing it on SSD stifles the performance of time-critical applications, requiring about 4-6 random reads/writes to SSD per estimate (lookup) and update (insert) operation.In this paper, we expand on the preliminary idea of the buffered count-min sketch (BCMS) {[Eydi et al., 2017]}, an SSD variant of the count-min sketch, that uses hash localization to scale efficiently out of RAM while keeping the total error bounded. We describe the design and implementation of the buffered count-min sketch, and empirically show that our implementation achieves 3.7 x-4.7 x speedup on update and 4.3 x speedup on estimate operations compared to the traditional count-min sketch on SSD.Our design also offers an asymptotic improvement in the external-memory model over the original data structure: r random I/Os are reduced to 1 I/O for the estimate operation. For a data structure that uses k blocks on SSD, w as the word/counter size, r as the number of rows, M as the number of bits in the main memory, our data structure uses kwr/M amortized I/Os for updates, or, if kwr/M 1, 1 I/O in the worst case. In typical scenarios, kwr/M is much smaller than 1. This is in contrast to O(r) I/Os incurred for each update in the original data structure.Lastly, we mathematically show that for the buffered count-min sketch, the error rate does not substantially degrade over the traditional count-min sketch. Specifically, we prove that for any query q, our data structure provides the guarantee: Pr[Error(q) = n epsilon (1+o(1))] = delta + o(1), which, up to o(1) terms, is the same guarantee as that of a traditional count-min sketch.

机译：频率估算数据结构（例如最小计数草图（CMS））已在数据库，网络，计算生物学和其他领域中找到了许多应用。许多使用最小计数草图的应用程序会处理大量且快速发展的数据集。对于旨在将高估错误保持在较低水平的数据密集型应用程序，最小计数草图变得太大而无法存储在可用的RAM中，并且可能必须迁移到外部存储设备（例如SSD）。由于随机读取/写入的特性最小计数草图的哈希操作的过程，将其简单地放在SSD上会扼杀时间紧迫的应用程序的性能，每个估计（查找）和更新（插入）操作需要对SSD进行约4-6次随机读取/写入。，我们扩展了缓冲计数最小草图（BCMS）{[Eydi et al。，2017]}的初步思想，它是计数最小草图的SSD变体，它使用哈希本地化来有效地扩展RAM之外，同时保持总误差有界。我们描述了缓冲的最小计数草图的设计和实现，并通过经验证明与传统的SSD最小计数草图相比，我们的实现在更新时实现3.7 x-4.7 x加速，在估计操作中实现4.3 x加速。与原始数据结构相比，它在外部存储器模型中提供了渐近的改进：r随机I / O减少为1 I / O用于估计操作。对于在SSD上使用k个块的数据结构，w为字/计数器大小，r为行数，M为主内存中的位数，我们的数据结构使用kwr / M摊余I / O进行更新，或者，如果kwr / M> 1，在最坏的情况下为1 I / O。在典型情况下，kwr / M远小于1。这与原始数据结构中每次更新产生的O（r）I / O相反。最后，我们从数学上表明，对于缓冲的最小计数草图，误差率在传统的最小计数草图上基本上不会降低。具体来说，我们证明对于任何查询q，我们的数据结构都提供了保证：Pr [Error（q）> = n epsilon（1 + o（1））] <= delta + o（1），其中最大为o （1）术语与传统的计数分钟草图相同。

著录项

来源
《LIPIcs : Leibniz International Proceedings in Informatics》 |2018年第1期|共15页
作者
Mayank Goswami; Dzejla Medjedovic; Emina Mekic; Prashant Pandey;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Streaming modelCount-min sketchCountingFrequencyExternal memoryI/O efficiencyBloom filterCounting filterQuotient filter;

机译：流模型最小计数计数频率外部存储器I / O效率布鲁姆过滤器计数过滤器商过滤器;

相似文献

外文文献
中文文献
专利

1. Buffered Count-Min Sketch on SSD: Theory and Experiments [J] . Mayank Goswami, Dzejla Medjedovic, Emina Mekic, LIPIcs : Leibniz International Proceedings in Informatics . 2018,第2期

机译：SSD上的缓冲计数最小草图：理论和实验
2. Electronic structure study of the CdS buffer layer in CIGS solar cells by X-ray absorption spectroscopy: Experiment and theory [J] . Schwartz Craig, Nordlund Dennis, Weng Tsu-Chien, Solar Energy Materials and Solar Cells: An International Journal Devoted to Photovoltaic, Photothermal, and Photochemical Solar Energy Conversion . 2016,第Null期

机译：X射线吸收光谱法研究CIGS太阳能电池中CdS缓冲层的电子结构：实验和理论
3. Interface science of virtual GaN substrates on Si(111) via Sc2O3/Y2O3 buffers: Experiment and theory [J] . Tarnawska L., Dabrowski J., Grzela T., Journal of Applied Physics . 2013,第21期

机译：通过Sc2O3 / Y2O3缓冲液在Si（111）上的虚拟GaN衬底的界面科学：实验和理论
4. STCCS: Segmented Time Controlled Count-Min Sketch [C] . Ismail Khram, Maha Shamseddine, Wassim Itani ITU Kaleidoscope Academic Conference . 2020

机译：STCCS：分段时间控制数米速素
5. THE SKETCH BOOK OF NATHANIEL HAWTHORNE: THEORY AND PRACTICE OF LITERARY SKETCHING IN THE NINETEENTH CENTURY. [D] . WEGNER, GREGORY ROBERT. 1987

机译：NATHANIEL HAWTHORNE速写本：十九世纪文学速写的理论和实践。
6. Tropical Fever and Dysentery; Being the Result of an Experiment by Order of Government in the General Hospital Calcutta and Garrison Hospital Fort-William in the Years 1849-50 with a Sketch of the Theory Resulting [O] . E. Hare 1854

机译：热带热和痢疾；作为政府命令在1849至50年间在加尔各答综合医院和威廉堡驻军医院进行实验的结果并附上了理论成果
7. Cross sections for single and double strand breaks in SV-40 virus in EO buffer after heavy ion irradiation: Experiment and theory [O] . Katz, Robert, Wesely, S. 1991

机译：重离子辐照后EO缓冲液中sV-40病毒单链和双链断裂的横截面：实验和理论
8. On Field Experiments, with a Sketch of a Plan for a Wind Wave Generation Experiment to Be Carried out off Aruba, N.A. Aboard the R/V Ridgely Warfield [R] . Kinsman, B. 1968

机译：在野外实验中，以及在R / V Ridgely Warfield上进行的风力发电实验计划的草图，将在阿鲁巴岛进行。

Buffered Count-Min Sketch on SSD: Theory and Experiments

摘要

著录项

相似文献

相关主题

期刊订阅