Scalable Metric Similarity Join Using MapReduce

机译：使用MapReduce的可伸缩度量相似度联接

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Given two collections of objects, metric similarity join finds all similar pairs of objects according to a particular distance function in metric space. There is an increasing demand to provide a scalable similarity join algorithm which can support efficient query and analytical services in the era of Big Data. In this paper, we propose SMS-Join, a parallel framework to support similarity join in metric space based on the MapReduce paradigm. The overall workflow of SMS-Join is that it first finds some records as pivots in the preprocessing phase and then splits the data into partitions based on them with a map job. Finally the join results are obtained via a reduce job. To ensure load balancing between the partitions, we devise a light-weighted sampling technique to obtain high quality samples while maintaining the high performance. To reduce the partition cost, we develop an iterative partition strategy in the map phase. We implement our framework upon Apache Spark platform and conduct extensive experiments on four real world datasets. The results show that our method significantly outperforms state-of-the-art methods.

机译：给定两个对象集合，度量相似性联接根据度量空间中的特定距离函数查找所有相似的对象对。对提供可扩展的相似性联接算法的需求不断增长，该算法可支持大数据时代的高效查询和分析服务。在本文中，我们提出了SMS-Join，这是一个基于MapReduce范式的支持度量空间中相似性联接的并行框架。 SMS-Join的总体工作流程是，它首先在预处理阶段中找到一些记录作为枢轴，然后使用地图作业根据它们将数据拆分为多个分区。最后，通过reduce作业获得连接结果。为了确保分区之间的负载平衡，我们设计了一种轻量级采样技术，以在保持高性能的同时获得高质量的样本。为了降低分区成本，我们在映射阶段开发了迭代分区策略。我们在Apache Spark平台上实现我们的框架，并在四个真实世界的数据集上进行了广泛的实验。结果表明，我们的方法明显优于最新方法。

著录项

来源
《IEEE International Conference on Data Engineering》|2019年|1662-1665|共4页
会议地点
作者
Jiacheng Wu; Yong Zhang; Jin Wang; Chunbin Lin; Yingjia Fu; Chunxiao Xing;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Extraterrestrial measurements; Kernel; Sampling methods; Computer science; Big Data; Silicon;

机译：地外测量;内核;采样方法;计算机科学;大数据;硅;

相似文献

外文文献
中文文献
专利

1. Metric Similarity Joins Using MapReduce [J] . Gang Chen, Keyu Yang, Lu Chen, IEEE Transactions on Knowledge and Data Engineering . 2017,第3期

机译：度量相似性使用MapReduce加入
2. Metric Similarity Joins Using MapReduce [J] . Gang Chen, Keyu Yang, Lu Chen, Theoretical and Experimental Plant Physiology . 2017,第3期

机译：使用MapReduce加入度量标准相似性
3. An efficient MapReduce algorithm for similarity join in metric spaces [J] . Liu Wen, Shen Yanming, Wang Peng Journal of supercomputing . 2016,第3期

机译：度量空间中相似连接的高效MapReduce算法
4. Scalable Metric Similarity Join Using MapReduce [C] . Jiacheng Wu, Yong Zhang, Jin Wang, IEEE International Conference on Data Engineering . 2019

机译：使用mapReduce可扩展度量相似性连接
5. ACE: Agile, Contingent and Efficient Similarity Joins Using MapReduce [D] . Lakshminarayanan, Mahalakshmi. 2013

机译：ACE：使用MapReduce的敏捷，偶然和有效相似性联接
6. MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data [O] . Jingjing Wang, Chen Lin 2015

机译：基于MapReduce的个性化本地敏感哈希用于大规模数据上的相似联接
7. Projection Based Large Scale High-Dimensional Data Similarity Join Using MapReduce Framework [O] . Youzhong Ma, Ruiling Zhang, Zhanyou Cui, 2020

机译：基于投影的大规模高维数据相似性连接使用MapReduce框架

Scalable Metric Similarity Join Using MapReduce

摘要

著录项

相似文献

相关主题

期刊订阅