...
首页> 外文期刊>Theoretical and Experimental Plant Physiology >A Generic Method for Accelerating LSH-Based Similarity Join Processing
【24h】

A Generic Method for Accelerating LSH-Based Similarity Join Processing

机译:一种加速基于LSH的相似性Join处理的通用方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Locality sensitive hashing (LSH) is an efficient method for solving the problem of approximate similarity search in high-dimensional spaces. Through LSH, a high-dimensional similarity join can be processed in the same way as hash join, making the cost of joining two large datasets linear. By judicially analyzing the properties of multiple LSH algorithms, we propose a generic method to speed up the process of joining two large datasets using LSH. The crux of our method lies in the way which we identify a set of representative points to reduce the number of LSH lookups. Theoretical analyzes show that our proposed method can greatly reduce the number of lookup operations and retain the same result accuracy compared to executing LSH lookups for every query point. Furthermore, we demonstrate the generality of our method by showing that the same principle can be applied to LSH algorithms for three different metrics: the Euclidean distance (QALSH), Jaccard similarity measure (MinHash), and Hamming distance (sequence hashing). Results from experimental studies using real datasets confirm our error analyzes and show significant improvements of our method over the state-of-the-art LSH method: to achieve over 0.95 recall, we only need to operate LSH lookups for at most 15 percent of the query points.
机译:地区敏感散列(LSH)是解决高维空间中近似相似性搜索问题的有效方法。通过LSH,可以以与哈希连接相同的方式处理高维相似连接,使得加入两个大型数据集线性的成本。通过司法分析多个LSH算法的特性,我们提出了一种通用方法来加速使用LSH加入两个大型数据集的过程。我们的方法的症状在于我们识别一组代表点以减少LSH查找的数量。理论分析表明,与对每个查询点执行LSH查找相比,我们所提出的方法可以大大减少查找操作的数量并保持相同的结果准确性。此外,我们通过表示相同的原理来证明我们的方法的一般性,可以应用于三种不同度量的LSH算法:欧几里德距离(Qalsh),Jaccard相似度测量(Minhash)和汉明距离(序列散列)。使用Real DataSets的实验研究结果证实了我们的错误分析并显示了通过最先进的LSH方法的方法改进:实现超过0.95次召回,我们只需要运营LSH查找,以最多为15%的人查询点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号