A Density-Aware Similarity Join Query Processing Algorithm on MapReduce

机译：基于MapReduce的密度感知相似连接查询处理算法

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Recently, the amount of data is rapidly increasing and thus MapReduce has attracted much interest as a new paradigm for such data-intensive applications. Similarity join is an essential operation for data analytics, including record linkage, near duplicate detection, document clustering. However, the performance of MapReduce is limited when applied on complex data analytical task involving joins of multiple datasets. Hence, workload-aware data partitioning techniques are required, which ensure the balance of computation of each machine. In this paper, we propose a similarity join algorithm using MapReduce that provides scalability and high performance by using grid-based data mapping technique for joining datasets. From the experiment analysis, we prove that our algorithm outperforms the existing algorithm under various data size and similarity thresholds.

机译：最近，数据量正在迅速增加，因此MapReduce作为此类数据密集型应用程序的新范例已引起了广泛的关注。相似联接是数据分析的一项基本操作，包括记录链接，近乎重复的检测，文档聚类。但是，当将MapReduce用于涉及多个数据集的连接的复杂数据分析任务时，其性能会受到限制。因此，需要知道工作负载的数据分区技术，以确保每台计算机的计算平衡。在本文中，我们提出了一种使用MapReduce的相似性联接算法，该算法通过使用基于网格的数据映射技术联接数据集来提供可伸缩性和高性能。通过实验分析，我们证明了在各种数据大小和相似度阈值下，我们的算法优于现有算法。

著录项

来源
《International Conference on Future Information Technology;International Conference on Multimedia and Ubiquitous Engineering》|2016年|469-475|共7页
会议地点
作者
Miyoung Jang; Youngho Song; Jae-Woo Chang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Cloud computing; Bigdata analysis; Similarity join; MapReduce; Grid-based partitioning;

机译：云计算;大数据分析;相似联接; MapReduce;基于网格的分区;

相似文献

外文文献
中文文献
专利

1. Grid-Based Parallel Algorithms of Join Queries for Analyzing Multi-Dimensional Data on MapReduce [J] . Miyoung JANG, Jae-Woo CHANG IEICE transactions on information and systems . 2018,第4期

机译：MapReduce上多维数据分析的基于网格的联合查询并行算法
2. Prefetched wald adaptive boost classification based Czekanowski similarity MapReduce for user query processing with bigdata [J] . Selvan S. Tamil, Balamurugan P., Vijayakumar M. Distributed and Parallel Databases . 2021,第4期

机译：基于预取的Wald自适应增强分类基于Ceekanowski相似性MapReduce与BigData的用户查询处理
3. SigMR: MapReduce-based SPARQL query processing by signature encoding and multi-way join [J] . Ahn Jinhyun, Im Dong-Hyuk, Kim Hong-Gee Journal of supercomputing . 2015,第10期

机译：SigMR：通过签名编码和多路联接的基于MapReduce的SPARQL查询处理
4. A Density-Aware Similarity Join Query Processing Algorithm on MapReduce [C] . Miyoung Jang, Youngho Song, Jae-Woo Chang International Conference on Future Information Technology . 2016

机译：MapReduce上的密度感知相似性Join查询处理算法
5. ACE: Agile, Contingent and Efficient Similarity Joins Using MapReduce [D] . Lakshminarayanan, Mahalakshmi. 2013

机译：ACE：使用MapReduce的敏捷，偶然和有效相似性联接
6. MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data [O] . Jingjing Wang, Chen Lin 2015

机译：基于MapReduce的个性化本地敏感哈希用于大规模数据上的相似联接
7. Grid-Based Parallel Algorithms of Join Queries for Analyzing Multi-Dimensional Data on MapReduce [O] . Miyoung JANG, Jae-Woo CHANG 2018

机译：基于网格的连接查询算法，用于分析MapReduce的多维数据
8. Interactive Query Processing in Big Data Systems: A Cross Industry Study of MapReduce Workloads. [R] . R. H. Katz S. Alspaugh Y. Chen 2012

机译：大数据系统中的交互式查询处理：mapReduce工作负载的跨行业研究。

A Density-Aware Similarity Join Query Processing Algorithm on MapReduce

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅