Optimization for Large-Scale Fuzzy Joins Using Fuzzy Filters in MapReduce

机译：MapReduce中使用模糊过滤器的大规模模糊连接优化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A fuzzy or similarity join is one of the most useful data processing and analysis operations for Big Data in a general context. It combines pairs of tuples for which the distance is lower than or equal to a given threshold ε. The fuzzy join is used in many practical applications, but it is extremely costly in time and space, and may even not be executed on large-scale datasets. Although there have been some studies to improve its performance by applying filters, a solution of an effective fuzzy filter for the join has never been conducted. In this paper, we thus extend our previous work by proposing a novel fuzzy filter to optimize fuzzy joins. This filter is a compact, probabilistic data structure that supports very fast similarity queries by maintaining a bit matrix, with small false positive rate and zero false negative rate. We show that our proposal is more efficient than others because of eliminating redundant data, reducing computation cost and avoiding duplicate output.

机译：在一般情况下，模糊或相似联接是大数据最有用的数据处理和分析操作之一。它组合了距离小于或等于给定阈值ε的成对的元组。模糊联接已在许多实际应用中使用，但是它在时间和空间上非常昂贵，甚至可能无法在大规模数据集上执行。尽管已经进行了一些研究，以通过应用过滤器来提高其性能，但是从未进行过有效的模糊过滤器联接的解决方案。因此，在本文中，我们通过提出一种新颖的模糊过滤器来优化模糊连接来扩展我们以前的工作。该过滤器是一种紧凑的概率数据结构，通过维护位矩阵（误报率小和误报率零）来支持非常快速的相似性查询。我们表明，由于消除了冗余数据，降低了计算成本并避免了重复输出，因此我们的提案比其他提案更有效率。

著录项

来源
《IEEE International Conference on Fuzzy Systems》|2020年|1-8|共8页
会议地点
作者
Thi-To-Quyen TRAN; Thuong-Cang PHAN; Anne LAURENT; Laurent D’ORAZIO;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Distributed databases; Hamming distance; Indexes; Redundancy; Task analysis; Optimization; Big Data;

机译：分布式数据库;距离;索引;冗余;任务分析;优化;大数据;

相似文献

外文文献
中文文献
专利

1. Fuzzy joins using MapReduce. [J] . G. Albeanu Computing reviews . 2013,第8期

机译：使用MapReduce模糊连接。
2. MapReduce-based fast fuzzy c-means algorithm for large-scale underwater image segmentation [J] . Xiu Li, Jingdong Song, Fan Zhang, Future generation computer systems . 2016,第DECa期

机译：基于MapReduce的快速模糊c均值算法在大规模水下图像分割中的应用
3. Fuzzy Belongingness, Fuzzy Quasi-coincidence and Convergence of Generalized Fuzzy Filters [J] . K. K. Mondal, S. K. Samanta The Journal of fuzzy mathematics . 2007,第4期

机译：广义模糊滤波器的模糊归属度，模糊拟重合和收敛
4. Improving Hamming distance-based fuzzy join in MapReduce using Bloom Filters [C] . Thi-To-Quyen TRAN, Thuong-Cang PHAN, Anne LAURENT, IEEE International Conference on Fuzzy Systems . 2018

机译：使用布隆过滤器改善MapReduce中基于汉明距离的模糊连接
5. Fuzzy search strategy generation for adversarial systems using fuzzy process particle swarm optimization, fuzzy patterns, and a hunch factor. [D] . Coffman-Wolph, Stephany. 2013

机译：使用模糊过程粒子群优化，模糊模式和预感因子的对抗系统模糊搜索策略生成。
6. A time series driven decomposed evolutionary optimization approach for reconstructing large-scale gene regulatory networks based on fuzzy cognitive maps [O] . Jing Liu, Yaxiong Chi, Chen Zhu, 2017

机译：时间序列驱动的分解进化优化方法用于基于模糊认知图的大规模基因调控网络重构
7. 1 Fuzzy Joins Using MapReduce [O] . Foto N. Afrati, Anish Das Sarma, David Menestrina, 2013

机译：1使用mapReduce进行模糊连接
8. Fuzzy set applications in engineering optimization: Multilevel fuzzy optimization [R] . Diaz, Alejandro R. 1989

机译：模糊集在工程优化中的应用：多级模糊优化

Optimization for Large-Scale Fuzzy Joins Using Fuzzy Filters in MapReduce

摘要

著录项

相似文献

相关主题

期刊订阅