DBSCAN on Resilient Distributed Datasets

机译：DBSCAN在弹性分布式数据集上

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

DBSCAN is a well-known density-based data clustering algorithm that is widely used due to its ability to find arbitrarily shaped clusters in noisy data. However, DBSCAN is hard to scale which limits its utility when working with large data sets. Resilient Distributed Datasets (RDDs), on the other hand, are a fast data-processing abstraction created explicitly for in-memory computation of large data sets. This paper presents a new algorithm based on DBSCAN using the Resilient Distributed Datasets approach: RDD-DBSCAN. RDD-DBSCAN overcomes the scalability limitations of the traditional DBSCAN algorithm by operating in a fully distributed fashion. The paper also evaluates an implementation of RDD-DBSCAN using Apache Spark, the official RDD implementation.

机译：DBSCAN是一种以众所周知的基于密度的数据聚类算法，由于其在嘈杂数据中找到任意形状的群集而被广泛使用。但是，DBSCAN难以扩展，在使用大数据集时限制其实用程序。另一方面，弹性分布式数据集（RDDS）是明确地创建的快速数据处理抽象，用于大数据集的内存计算。本文介绍了一种基于DBSCAN的新算法，使用弹性分布式数据集方法：RDD-DBSCAN。 RDD-DBSCAN通过以完全分布式的方式运行，克服了传统DBSCAN算法的可扩展性限制。本文还评估了使用Apache Spark，官方RDD实施的RDD-DBSCAN的实现。

著录项

来源
《International Conference on High Performance Computing Simulation》|2015年||共10页
会议地点
作者
Cordova Irving; Teng-Sheng Moh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类一般性问题;
关键词
Apache Spark; DBSCAN; MapReduce; Resilient Distributed Datasets; data clustering; data partition; parallel systems;

机译：apache spark;dbscan;mapreduce;弹性分布式数据集;数据聚类;数据分区;并行系统;

相似文献

外文文献
中文文献
专利

1. Mining moving object gathering pattern based on Resilient Distributed Datasets and R-tree index [J] . He Qian, Chen Yiting, Dong Qinghe, Neurocomputing . 2020,第Juna14期

机译：基于弹性分布式数据集和R树索引的挖掘移动对象收集模式
2. Cloud-based parallel power flow calculation using resilient distributed datasets and directed acyclic graph [J] . Dewen WANG, Fangfang ZHOU, Jiangman LI Journal of Modern Power Systems and Clean Energy . 2019,第1期

机译：基于云的并行电流计算使用弹性分布式数据集和定向非循环图
3. Cloud-based parallel power flow calculation using resilient distributed datasets and directed acyclic graph [J] . Dewen WANG1, Fangfang ZHOU1, Jiangman LI1 现代电力系统与清洁能源学报(英文) . 2019,第001期

机译：使用弹性分布式数据集和有向无环图的基于云的并行潮流计算
4. DBSCAN on Resilient Distributed Datasets [C] . Cordova Irving, Teng-Sheng Moh International Conference on High Performance Computing Simulation . 2015

机译：DBSCAN关于弹性分布式数据集
5. Nonconvex Representation Learning from Distributed Datasets [D] . Raja, Haroon. 2019

机译：从分布式数据集学习的非谐波表示
6. ExpressionDB: An open source platform for distributing genome-scale datasets [O] . Laura D. Hughes, Scott A. Lewis, Michael E. Hughes 2011

机译：ExpressionDB：一个用于分发基因组规模数据集的开源平台
7. Automatic fuzzy-DBSCAN algorithm for morphological and overlapping datasets [O] . Yelghi Aref, KoSe Cemal, Yelghi Asef, 2020

机译：用于形态学和重叠数据集的自动模糊DBSCAN算法
8. The Montage architecture for grid-enabled science processing of large, distributed datasets [R] . Jacob, Joseph C., Katz, Daniel S ., Prince, Thomas, 2004

机译：用于大型分布式数据集的网格化科学处理的蒙太奇架构

DBSCAN on Resilient Distributed Datasets

摘要

著录项

相似文献

相关主题

期刊订阅