首页> 外文期刊>Concurrency and computation: practice and experience >Parallel similarity joins on massive high-dimensional data using MapReduce
【24h】

Parallel similarity joins on massive high-dimensional data using MapReduce

机译:使用MapReduce将并行相似性连接到海量高维数据上

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we focus on high-dimensional similarity join (HDSJ) using MapReduce paradigm. As the volume of the data and the number of the dimensions increase, the computation cost of HDSJ will increase exponentially. There is no existing effective approach that can process HDSJ efficiently, so we propose a novel method called symbolic aggregate approximation (SAX)-based HDSJ to deal with the problem. SAX is the abbreviation of symbolic aggregate approximation that is a dimensionality reduction technique and widely used in time series processing, we use SAX to represent the high-dimensional vectors in this paper and reorganize these vectors into groups based on their SAX representations. For the very high-dimensional vectors, we also propose an improved SAX-based HDSJ approach. Finally, we implement SAX-based HDSJ and improved SAX-based HDSJ on Hadoop-0.20.2 and perform comprehensive experiments to test the performance, we also compare SAX-based HDSJ and improved SAX-based HDSJ with the existing method. The experiment results show that our proposed approaches have much better performance than that of the existing method. Copyright © 2015 John Wiley & Sons, Ltd.
机译:在本文中,我们专注于使用MapReduce范式的高维相似性联接(HDSJ)。随着数据量和维数的增加,HDSJ的计算成本将成倍增加。现有的有效方法无法有效地处理HDSJ,因此我们提出了一种新的方法,称为基于符号聚合近似(SAX)的HDSJ来解决该问题。 SAX是符号聚合近似的缩写,是一种降维技术,已广泛用于时间序列处理中,在本文中,我们使用SAX表示高维向量,并根据其SAX表示将这些向量重组为组。对于非常高维的矢量,我们还提出了一种改进的基于SAX的HDSJ方法。最后,我们在Hadoop-0.20.2上实现了基于SAX的HDSJ和改进的基于SAX的HDSJ,并进行了全面的实验以测试性能,我们还将基于SAX的HDSJ和改进的基于SAX的HDSJ与现有方法进行了比较。实验结果表明,我们提出的方法具有比现有方法更好的性能。版权所有©2015 John Wiley&Sons,Ltd.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号