Efficient Parallel Set-Similarity Joins Using MapReduce

机译：使用MapReduce有效的并行集合相似性连接

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we study how to efficiently perform set-similarity joins in parallel using the popular MapReduce framework. We propose a 3-stage approach for end-to-end set-similarity joins. We take as input a set of records and output a set of joined records based on a set-similarity condition. We efficiently partition the data across nodes in order to balance the workload and minimize the need for replication. We study both self-join and R-S join cases, and show how to carefully control the amount of data kept in main memory on each node. We also propose solutions for the case where, even if we use the most fine-grained partitioning, the data still does not fit in the main memory of a node. We report results from extensive experiments on real datasets, synthetically increased in size, to evaluate the speedup and scaleup properties of the proposed algorithms using Hadoop.

机译：在本文中，我们研究如何使用流行的MapReduce框架进行有效地执行集相似之处。我们提出了一种用于端到端集合相似性联合的三阶段方法。我们以输入一组记录和输出一组连接的记录，基于设置相似度条件。我们有效地跨节点分区数据，以便平衡工作负载并最大限度地减少对复制的需求。我们研究了自行连接和R-S连接案例，并展示了如何仔细控制每个节点上保存在主内存中的数据量。我们还提出了解决方案的情况，即使我们使用最细粒度的分区，数据仍然不适合节点的主存储器。我们报告了对实际数据集的广泛实验的结果，综合增加了大小，以评估使用Hadoop的所提出的算法的加速和扩展性质。

著录项

来源
《ACM SIGMOD international conference on management of data》|2010年||共12页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词
algorithms; performance;

机译：算法;表现;

相似文献

外文文献
中文文献
专利

1. Grid-Based Parallel Algorithms of Join Queries for Analyzing Multi-Dimensional Data on MapReduce [J] . Miyoung JANG, Jae-Woo CHANG IEICE transactions on information and systems . 2018,第4期

机译：MapReduce上多维数据分析的基于网格的联合查询并行算法
2. Parallel similarity joins on massive high-dimensional data using MapReduce [J] . Ma Youzhong, Meng Xiaofeng, Wang Shaoya Concurrency and computation: practice and experience . 2016,第1期

机译：使用MapReduce将并行相似性连接到海量高维数据上
3. An efficient MapReduce algorithm for similarity join in metric spaces [J] . Liu Wen, Shen Yanming, Wang Peng Journal of supercomputing . 2016,第3期

机译：度量空间中相似连接的高效MapReduce算法
4. Efficient Parallel Set-Similarity Joins Using MapReduce [C] . Rares Vernica, Michael J. Carey, Chen Li ACM SIGMOD international conference on management of data;SIGMOD 2010 . 2010

机译：使用MapReduce的高效并行集相似性联接
5. Efficient Processing of Set-Similarity Joins on Large Clusters. [D] . Vernica, Rares. 2011

机译：大型群集上集合相似联接的有效处理。
6. Efficient and Scalable Graph Similarity Joins in MapReduce [O] . Yifan Chen, Xiang Zhao, Chuan Xiao, -1

机译：高效且可扩展的图相似度加入MapReduce
7. Efficient Parallel Set-Similarity Joins Using MapReduce [O] . Rares Vernica, Michael J. Carey, Chen Li 2011

机译：使用MapReduce的高效并行集相似性联接

Efficient Parallel Set-Similarity Joins Using MapReduce

摘要

著录项

相似文献

相关主题

期刊订阅