首页> 外文期刊>Journal of supercomputing >LSH-based distributed similarity indexing with load balancing in high-dimensional space
【24h】

LSH-based distributed similarity indexing with load balancing in high-dimensional space

机译:高空间空间基于LSH的负载均衡分布式相似性索引

获取原文
获取原文并翻译 | 示例
           

摘要

Locality-sensitive hashing (LSH) and its variants are well-known indexing schemes for solving the similarity search problem in high-dimensional space. Traditionally, these indexing schemes are centrally managed and multiple hash tables are needed to guarantee the search quality. However, due to the limitation of storage space and processing capacity of the server, the centralized indexing schemes become impractical for massive data objects. Therefore, several distributed indexing schemes based on peer-to-peer (P2P) networks are proposed, whereas how to ensure load balancing is still one of the key issues. To solve the problem, in this paper, we propose two theoretical LSH-based data distribution models in P2P networks for datasets with homogeneous and heterogeneous l2documentclass[12pt]{minimal}earlier schemes, to our knowledge, we focus on load balancing for a single hash table rather than multiple tables, which has not been considered previously. Then, we propose a static distributed indexing scheme with a novel load balancing indexing mapping method based on the cumulative distribution function by our models. Furthermore, we propose a dynamic load rebalancing algorithm using virtual node method of P2P networks to make the static indexing scheme more practical and robust. The experiments based on synthetic and real datasets show that the proposed distributed similarity indexing schemes are effective and efficient for load balancing in similarity indexing of high-dimensional space.
机译:局部敏感哈希(LSH)及其变体是解决高维空间中相似性搜索问题的众所周知的索引方案。传统上,这些索引方案是集中管理的,并且需要多个哈希表来保证搜索质量。但是,由于服务器的存储空间和处理能力的限制,集中索引方案对于海量数据对象变得不切实际。因此,提出了几种基于对等(P2P)网络的分布式索引方案,然而如何保证负载均衡仍然是关键问题之一。为了解决该问题,在本文中,我们针对具有同质和异构l2 documentclass [12pt] {minimal}较早方案的数据集在P2P网络中提出了两个基于LSH的理论数据分发模型,据我们所知,我们专注于针对单个哈希表而不是多个表,以前没有考虑过。然后,基于模型的累积分布函数,提出了一种具有新颖的负载均衡索引映射方法的静态分布式索引方案。此外,我们提出了一种使用P2P网络的虚拟节点方法的动态负载平衡算法,以使静态索引方案更加实用和健壮。基于合成数据集和真实数据集的实验表明,所提出的分布式相似度索引方案对于高维空间相似度索引中的负载均衡是有效和高效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号