Parallel similarity joins on massive high-dimensional data using MapReduce

Ma Youzhong; Meng Xiaofeng; Wang Shaoya

首页> 外文期刊>Concurrency and computation: practice and experience >Parallel similarity joins on massive high-dimensional data using MapReduce

【24h】

Parallel similarity joins on massive high-dimensional data using MapReduce

机译：使用MapReduce将并行相似性连接到海量高维数据上

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we focus on high-dimensional similarity join (HDSJ) using MapReduce paradigm. As the volume of the data and the number of the dimensions increase, the computation cost of HDSJ will increase exponentially. There is no existing effective approach that can process HDSJ efficiently, so we propose a novel method called symbolic aggregate approximation (SAX)-based HDSJ to deal with the problem. SAX is the abbreviation of symbolic aggregate approximation that is a dimensionality reduction technique and widely used in time series processing, we use SAX to represent the high-dimensional vectors in this paper and reorganize these vectors into groups based on their SAX representations. For the very high-dimensional vectors, we also propose an improved SAX-based HDSJ approach. Finally, we implement SAX-based HDSJ and improved SAX-based HDSJ on Hadoop-0.20.2 and perform comprehensive experiments to test the performance, we also compare SAX-based HDSJ and improved SAX-based HDSJ with the existing method. The experiment results show that our proposed approaches have much better performance than that of the existing method. Copyright © 2015 John Wiley & Sons, Ltd.

机译：在本文中，我们专注于使用MapReduce范式的高维相似性联接（HDSJ）。随着数据量和维数的增加，HDSJ的计算成本将成倍增加。现有的有效方法无法有效地处理HDSJ，因此我们提出了一种新的方法，称为基于符号聚合近似（SAX）的HDSJ来解决该问题。 SAX是符号聚合近似的缩写，是一种降维技术，已广泛用于时间序列处理中，在本文中，我们使用SAX表示高维向量，并根据其SAX表示将这些向量重组为组。对于非常高维的矢量，我们还提出了一种改进的基于SAX的HDSJ方法。最后，我们在Hadoop-0.20.2上实现了基于SAX的HDSJ和改进的基于SAX的HDSJ，并进行了全面的实验以测试性能，我们还将基于SAX的HDSJ和改进的基于SAX的HDSJ与现有方法进行了比较。实验结果表明，我们提出的方法具有比现有方法更好的性能。版权所有©2015 John Wiley＆Sons，Ltd.

著录项

来源
《Concurrency and computation: practice and experience》 |2016年第1期|166-183|共18页
作者
Ma Youzhong; Meng Xiaofeng; Wang Shaoya;
展开▼
作者单位

Luoyang Normal University School of Information and Technology Luoyang China;

Renmin University of China School of Information Beijing China;

Renmin University of China School of Information Beijing China;

NEC Laboratories China Beijing China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
similarity join; MapReduce; symbolic aggregate approximation; high‐dimensional data; piecewise aggregate approximation;

机译：相似连接;MapReduce;符号聚合逼近;高维数据;分段聚合逼近;

相似文献

外文文献
中文文献
专利

1. Set similarity join on massive probabilistic data using MapReduce [J] . Youzhong Ma, Xiaofeng Meng Distributed and Parallel Databases . 2014,第3期

机译：使用MapReduce在海量概率数据上设置相似性联接
2. Parallelized Jaccard-based learning method and MapReduce implementation for mobile devices recognition from massive network data [J] . Jun, Liu, Yinzhou, Communications, China . 2013,第7期

机译：基于并行Jaccard的学习方法和MapReduce实现，可从海量网络数据中识别移动设备
3. MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data [J] . JingjingWang, ChenLin Computational intelligence and neuroscience . 2015,第1期

机译：基于MapReduce的个性化本地敏感哈希，用于大规模数据上的相似联接
4. Efficient Similarity Joins on Massive High-Dimensional Datasets Using MapReduce [C] . Luo Wuman, Tan Haoyu, Mao Huajian, 2012 IEEE 13th International Conference on Mobile Data Management. . 2012

机译：使用MapReduce在大量高维数据集上进行有效的相似性联接
5. ACE: Agile, Contingent and Efficient Similarity Joins Using MapReduce [D] . Lakshminarayanan, Mahalakshmi. 2013

机译：ACE：使用MapReduce的敏捷，偶然和有效相似性联接
6. MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data [O] . Jingjing Wang, Chen Lin 2015

机译：基于MapReduce的个性化本地敏感哈希用于大规模数据上的相似联接
7. Projection Based Large Scale High-Dimensional Data Similarity Join Using MapReduce Framework [O] . Youzhong Ma, Ruiling Zhang, Zhanyou Cui, 2020

机译：基于投影的大规模高维数据相似性连接使用MapReduce框架

Parallel similarity joins on massive high-dimensional data using MapReduce

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅