首页> 外文会议>IEEE Conference on Computer Communications Workshops >Shuffle-Efficient Distributed Locality Sensitive Hashing on Spark
【24h】

Shuffle-Efficient Distributed Locality Sensitive Hashing on Spark

机译:Shuffle高效的分布式地区敏感散列

获取原文

摘要

Locality Sensitive Hashing (LSH) is an important indexing technique for approximate similarity search in high-dimensional spaces. An obvious limitation of LSH approaches is the lack of capability and scalability to deal with massive data. This paper proposes a distributed variant of LSH called Spark-LSH, which is implemented on Apache Spark, a well-known distributed computing framework. We design a shuffle-efficient indexing scheme for the Spark-LSH, which can reduce the data shuffle and improve the network efficiency when constructing the hash table indices. Furthermore, we propose a location-aware querying scheme to improve the query performance. Experiments show that the Spark-LSH scheme can reduce the network shuffle overhead remarkably and accelerate the query significantly.
机译:地区敏感散列(LSH)是一种重要的索引技术,用于高维空间中的近似相似性搜索。 LSH方法的明显限制是缺乏处理大规模数据的能力和可扩展性。本文提出了一种称为Spark-LSH的LSH的分布式变体,其在Apache Spark中实现了一个众所周知的分布式计算框架。我们为Spark-LSH设计了一个有效的索引方案,可以减少数据随机播放,并在构建哈希表索引时提高网络效率。此外,我们提出了一个位置感知查询方案来提高查询性能。实验表明,火花LSH方案可以显着减少网络随机开销,并显着加速查询。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号