HDKV: supporting efficient high-dimensional similarity search in key-value stores

Wei Zhou; Jizhong Han; Zhang Zhang; Jiao Dai; Zhiyong Xu

首页> 外文期刊>Concurrency and Computation >HDKV: supporting efficient high-dimensional similarity search in key-value stores

【24h】

HDKV: supporting efficient high-dimensional similarity search in key-value stores

机译：HDKV：在键值存储中支持高效的高维相似度搜索

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Key-value stores are widely used on large-scale data management in the cloud environment. However, they can only naturally support key-based queries, and do not have efficient solutions for value-based queries. Thus, dealing with high-dimensional data in key-value stores is still a big challenge. State-of-the-art solutions apply value-based tree-structure indexes to solve this issue. These methods suffer from the curse of dimensionality and cannot achieve satisfactory performance. They also bring serious load unbalancing problem among servers, and result in dramatic system scalability degradation.Meanwhile, similarity search in high-dimensional data space becomes more and more popular in today's cloud applications. Due to the lack of efficient algorithms for value-based queries, users have to wait for a long time before the results are returned. To address this issue, we propose a novel approach called high-dimensional similarity query in key-value stores (HDKV), which can generate similarity results in a short time and maintain good database scalability. In HDKV, a strict order-preserving hash function is designed to map nearby objects in the high-dimensional space onto adjacent keys of a continuous linear space in key-value stores. With this strategy, many expensive random accesses are replaced with more efficient scan accesses. The experimental evaluation on real world data set shows that compared to the state-of-the-art methods, HDKV can dramatically reduce the search time with little impact on the accuracy.

机译：键值存储广泛用于云环境中的大规模数据管理。但是，它们自然只能支持基于键的查询，而没有针对基于值的查询的有效解决方案。因此，在键值存储中处理高维数据仍然是一个很大的挑战。最新的解决方案使用基于值的树结构索引来解决此问题。这些方法遭受维度的诅咒并且不能实现令人满意的性能。它们还带来了服务器之间严重的负载不平衡问题，并导致系统可伸缩性急剧下降。同时，在高维数据空间中的相似性搜索在当今的云应用程序中变得越来越流行。由于缺乏用于基于值的查询的高效算法，因此用户必须等待很长时间才能返回结果。为了解决这个问题，我们提出了一种在键值存储（HDKV）中称为高维相似性查询的新颖方法，该方法可以在短时间内生成相似性结果并保持良好的数据库可伸缩性。在HDKV中，设计了严格的保留顺序的哈希函数，以将高维空间中的附近对象映射到键值存储中连续线性空间的相邻键上。通过这种策略，许多昂贵的随机访问被更高效的扫描访问所取代。对现实世界数据集的实验评估表明，与最新方法相比，HDKV可以显着减少搜索时间，而对准确性的影响很小。

著录项

来源
《Concurrency and Computation》 |2013年第12期|1675-1698|共24页
作者
Wei Zhou; Jizhong Han; Zhang Zhang; Jiao Dai; Zhiyong Xu;
展开▼
作者单位

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China ,Graduate University of the Chinese Academy of Sciences, Beijing, China;

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China ,Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China;

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China ,Graduate University of the Chinese Academy of Sciences, Beijing, China;

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China;

Department of Math & and Computer Science, Suffolk University, Boston, MA 02114, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
database scalability; key-value stores; high-dimensional; similarity search; KNN query; range query;

机译：数据库可伸缩性;键值存储;高维相似度搜索KNN查询;范围查询;

相似文献

外文文献
中文文献
专利

1. Linking identical neighborly partitions for efficient high-dimensional similarity search in unstructured peer-to-peer systems [J] . Bin Cui, Linhao Xu, Jiakui Zhao Distributed and Parallel Databases . 2009,第2a3期

机译：链接相同的相邻分区，以在非结构化对等系统中进行高效的高维相似性搜索
2. Indexing high-dimensional data for efficient in-memory similarity search [J] . Bin Cui, Beng Chin Coi, Jianwen Su, IEEE Transactions on Knowledge and Data Engineering . 2005,第3期

机译：索引高维数据以进行有效的内存相似度搜索
3. SPY-TECf An efficient indexing method for similarity search in high-dimensional data spaces [J] . Dong-Ho Lee, Hyoung-Joo Kim Data & Knowledge Engineering . 2000,第1期

机译：SPY-TECf一种高效的索引方法，用于在高维数据空间中进行相似性搜索
4. A Large-Scale Online Search System of High-Dimensional Vectors Based on Key-Value Store [C] . Zhang Yuezhuo, Zha Li, Liu Jia, 2012 Eighth International Conference on Semantics, Knowledge and Grids. . 2012

机译：基于键值存储的大规模高维向量在线搜索系统
5. Efficient similarity search in high-dimensional data spaces. [D] . Li, Yue. 2004

机译：高维数据空间中的有效相似性搜索。
6. Real and synthetic data sets for benchmarking key-value stores focusing on various data types and sizes [O] . Hyuk-Yoon Kwon 2020

机译：真实和综合数据集用于基准测试关键值存储重点是各种数据类型和大小
7. 1Efficiently Supporting Edit Distance based String Similarity Search Using B+-trees [O] . Wei Lu, Xiaoyong Du, Marios Hadjieleftheriou, 2014

机译：1使用B +树有效地支持基于编辑距离的字符串相似性搜索

HDKV: supporting efficient high-dimensional similarity search in key-value stores

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅