首页> 外文期刊>Concurrency and Computation >An efficient similarity join approach on large-scale high-dimensional data using random projection
【24h】

An efficient similarity join approach on large-scale high-dimensional data using random projection

机译:使用随机投影的大规模高维数据有效相似连接方法

获取原文
获取原文并翻译 | 示例

摘要

Similarity join on large-scale high-dimensional data faces major challenges because of the datascale and the cure of dimensionality. Random projection with p-stable distribution can reducethe high-dimensional data form d-dimension to k-dimension (k ≪ d), the distance of the datain k-dimensional space can be used to filter out as many data pairs as possible at relative lowcost. Based on the above idea, we proposed two novel approaches to deal with large-scalehigh-dimensional data similarity join: projection-based similarity join (PromSimJ) algorithm andprojection space partitioning–based similarity join (ProSPSimJ) algorithm. The comprehensiveexperiments were performed to test the performance of the above methods.We also comparedthe performance of the above methods with that of the naive method block nested loop join.The final experimental results prove that our approaches have much better performance andgood scalability.
机译:由于数据规模和维数的固化,大规模高维数据的相似性联接面临着重大挑战。具有p稳定分布的随机投影可以将高维数据从d维缩减为k维(k≪ d),数据在k维空间中的距离可以用来过滤相对位置处尽可能多的数据对低成本。基于上述思想,我们提出了两种处理大规模高维数据相似性连接的新颖方法:基于投影的相似性连接(PromSimJ)算法和基于投影空间划分的相似性连接(ProSPSimJ)算法。进行了综合实验以测试上述方法的性能。我们还将上述方法的性能与朴素方法块嵌套循环联接的性能进行了比较。最终的实验结果证明我们的方法具有更好的性能和良好的可伸缩性。

著录项

  • 来源
    《Concurrency and Computation》 |2019年第20期|e5303.1-e5303.15|共15页
  • 作者单位

    School of Information and Technology Luoyang Normal University Luoyang China Henan Key Laboratory for Big Data Processing and Analytics of Electronic Commerce Luoyang China Central Plains Economic Zone Wisdom Tourism Collaborative Innovation Center in Henan Province Luoyang China;

    School of Information and Technology Luoyang Normal University Luoyang China Central Plains Economic Zone Wisdom Tourism Collaborative Innovation Center in Henan Province Luoyang China;

    School of Information Renmin University of China Beijing China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    high-dimensional data; p-stable distribution; random projection; similarity join;

    机译:高维数据p稳定分布;随机投影相似加入;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号