首页> 外文期刊>Concurrency and Computation >HDKV: supporting efficient high-dimensional similarity search in key-value stores
【24h】

HDKV: supporting efficient high-dimensional similarity search in key-value stores

机译:HDKV:在键值存储中支持高效的高维相似度搜索

获取原文
获取原文并翻译 | 示例

摘要

Key-value stores are widely used on large-scale data management in the cloud environment. However, they can only naturally support key-based queries, and do not have efficient solutions for value-based queries. Thus, dealing with high-dimensional data in key-value stores is still a big challenge. State-of-the-art solutions apply value-based tree-structure indexes to solve this issue. These methods suffer from the curse of dimensionality and cannot achieve satisfactory performance. They also bring serious load unbalancing problem among servers, and result in dramatic system scalability degradation.Meanwhile, similarity search in high-dimensional data space becomes more and more popular in today's cloud applications. Due to the lack of efficient algorithms for value-based queries, users have to wait for a long time before the results are returned. To address this issue, we propose a novel approach called high-dimensional similarity query in key-value stores (HDKV), which can generate similarity results in a short time and maintain good database scalability. In HDKV, a strict order-preserving hash function is designed to map nearby objects in the high-dimensional space onto adjacent keys of a continuous linear space in key-value stores. With this strategy, many expensive random accesses are replaced with more efficient scan accesses. The experimental evaluation on real world data set shows that compared to the state-of-the-art methods, HDKV can dramatically reduce the search time with little impact on the accuracy.
机译:键值存储广泛用于云环境中的大规模数据管理。但是,它们自然只能支持基于键的查询,而没有针对基于值的查询的有效解决方案。因此,在键值存储中处理高维数据仍然是一个很大的挑战。最新的解决方案使用基于值的树结构索引来解决此问题。这些方法遭受维度的诅咒并且不能实现令人满意的性能。它们还带来了服务器之间严重的负载不平衡问题,并导致系统可伸缩性急剧下降。同时,在高维数据空间中的相似性搜索在当今的云应用程序中变得越来越流行。由于缺乏用于基于值的查询的高效算法,因此用户必须等待很长时间才能返回结果。为了解决这个问题,我们提出了一种在键值存储(HDKV)中称为高维相似性查询的新颖方法,该方法可以在短时间内生成相似性结果并保持良好的数据库可伸缩性。在HDKV中,设计了严格的保留顺序的哈希函数,以将高维空间中的附近对象映射到键值存储中连续线性空间的相邻键上。通过这种策略,许多昂贵的随机访问被更高效的扫描访问所取代。对现实世界数据集的实验评估表明,与最新方法相比,HDKV可以显着减少搜索时间,而对准确性的影响很小。

著录项

  • 来源
    《Concurrency and Computation》 |2013年第12期|1675-1698|共24页
  • 作者单位

    Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China ,Graduate University of the Chinese Academy of Sciences, Beijing, China;

    Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China ,Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China;

    Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China ,Graduate University of the Chinese Academy of Sciences, Beijing, China;

    Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China;

    Department of Math & and Computer Science, Suffolk University, Boston, MA 02114, USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    database scalability; key-value stores; high-dimensional; similarity search; KNN query; range query;

    机译:数据库可伸缩性;键值存储;高维相似度搜索KNN查询;范围查询;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号