Cache-Oblivious Hashing

Rasmus Pagh; Zhewei Wei; Ke Yi; Qin Zhang

首页> 外文期刊>Algorithmica >Cache-Oblivious Hashing

【24h】

Cache-Oblivious Hashing

机译：缓存不可忽略的哈希

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average t_q = 1 +1 /2~Ω(b) disk accesses for any load factor α bounded away from 1. However, such near-perfect performance is achieved only when b is known and the hash table is particularly tuned for working with such a blocking. In this paper we study if it is possible to build a cache-oblivious hash table that works well with any blocking. Such a hash table will automatically perform well across all levels of the memory hierarchy and does not need any hardware-specific tuning, an important feature in autonomous databases. We first show that linear probing, a classical collision resolution strategy for hash tables, can be easily made cache-oblivious but it only achieves t_q ＝ 1 +Θ(α/b) even if a truly random hash function is used. Then we demonstrate that the block probing algorithm (Pagh et al. in SIAM Rev. 53(3):547-558,2011) achieves t_q = 1 +1 /2~Ω(b), thus matching the cache-aware bound, if the following two conditions hold: (a) b is a power of 2; and (b) every block starts at a memory address divisible by b. Note that the two conditions hold on a real machine, although they are not stated in the cache-oblivious model. Interestingly, we also show that neither condition is dispensable: if either of them is removed, the best obtainable bound is t_q = 1 +O(α/b), which is exactly what linear probing achieves.

机译：哈希表，尤其是其外部存储器版本，是大型数据库中最重要的索引结构之一。假设真正的随机哈希函数，已知在块大小为b的标准外部哈希表中，对于任何负载因子α，搜索特定密钥只需要预期的平均t_q = 1 +1 / 2〜Ω（b）磁盘访问限制为1。但是，只有当b已知且哈希表经过特别调整以用于这种阻塞时，才能实现这种近乎完美的性能。在本文中，我们研究是否有可能构建一个对任何阻塞都适用的，可以忽略缓存的哈希表。这样的哈希表将在内存层次结构的所有级别上自动执行良好，并且不需要任何特定于硬件的调整，这是自治数据库中的重要功能。我们首先表明，线性探测是哈希表的经典冲突解决策略，可以很容易地使缓存忽略，但是即使使用真正的随机哈希函数，也只能达到t_q ＝ 1 +Θ（α/ b）。然后，我们证明了块探测算法（Pagh等人在SIAM Rev. 53（3）：547-558,2011中）实现了t_q = 1 +1 / 2/2〜Ω（b），从而匹配了缓存感知的边界，如果满足以下两个条件：（a）b是2的幂。（b）每个块均始于可被b整除的存储器地址。请注意，这两个条件在实际机器上均有效，尽管在忽略缓存的模型中未说明。有趣的是，我们还显示了这两个条件都不是可有可无的：如果将它们中的任何一个都删除，则可获得的最佳边界是t_q = 1 + O（α/ b），这正是线性探测所能实现的。

著录项

来源
《Algorithmica 》 |2014年第4期| 864-883| 共20页
作者
Rasmus Pagh; Zhewei Wei; Ke Yi; Qin Zhang;
展开▼
作者单位

IT University of Copenhagen, Copenhagen, Denmark;

Department of Computer Science, MADALGO (Center for Massive Data Algorithmics-A Center of the Danish National Research Foundation), Aarhus University, Aarhus, Denmark;

Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China;

Indiana University Bloomington, Bloomington, IN, USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Cache-oblivious algorithms; Hashing;

机译：缓存忽略算法;散列;

相似文献

外文文献
中文文献
专利

1. Iterative Spaced Seed Hashing: Closing the Gap Between Spaced Seed Hashing and k-mer Hashing [J] . Petrucci Enrico, Noe Laurent, Pizzi Cinzia, Journal of computational biology: A journal of computational molecular cell biology . 2020 ,第2期

机译：迭代间隔的种子散列：关闭间距种子散列与K-MEL散列之间的间隙
2. Iterative Spaced Seed Hashing: Closing the Gap Between Spaced Seed Hashing and k -mer Hashing [J] . Enrico Petrucci, Laurent Noé, Cinzia Pizzi, Journal of computational biology . 2020 ,第2期

机译：迭代间隔的种子散列：关闭间隔的种子散列和K-MER散列之间的间隙
3. Hashing Nets for Hashing: A Quantized Deep Learning to Hash Framework for Remote Sensing Image Retrieval [J] . Li Peng, Han Lirong, Tao Xuanwen, IEEE Transactions on Geoscience and Remote Sensing . 2020 ,第10期

机译：用于哈希的散列网：遥感图像检索散列框架的量化深度学习
4. Cache-Oblivious Hashing [C] . Rasmus Pagh, Zhewei Wei, Ke Yi, Proceedings of the Twenty-Ninth ACM (Association for Computing Machinery) SIGMOD-SIGACT-SIGART symposium on principles of database systems . 2010

机译：缓存不可忽略的哈希
5. Improving Cuckoo Hashing with Perfect Hashing [D] . Chadalavada, Moulika. 2017

机译：完善哈柴藏的完善哈希
6. Cache-Oblivious parallel SIMD Viterbi decoding for sequence search in HMMER [O] . Miguel Ferreira, Nuno Roma, Luis MS Russo 2014

机译：用于HMMER中序列搜索的高速缓存不并行SIMD维特比解码
7. Cache-oblivious hashing [O] . Pagh, Rasmus, Wei, Zhewei, Yi, Ke, 2014

机译：缓存无关的哈希

Cache-Oblivious Hashing

摘要

著录项

相似文献

相关主题

期刊订阅