A Parallel DBSCAN Algorithm Based on Spark

机译：基于Spark的并行DBSCAN算法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the explosive growth of data, we have entered the era of big data. In order to sift through masses of information, many data mining algorithms using parallelization are being implemented. Cluster analysis occupies a pivotal position in data mining, and the DBSCAN algorithm is one of the most widely used algorithms for clustering. However, when the existing parallel DBSCAN algorithms create data partitions, the original database is usually divided into several disjoint partitions, with the increase in data dimension, the splitting and consolidation of high-dimensional space will consume a lot of time. To solve the problem, this paper proposes a parallel DBSCAN algorithm (S_DBSCAN) based on Spark, which can quickly realize the partition of the original data and the combination of the clustering results. It is divided into the following steps: 1) partitioning the raw data based on a random sample, 2) computing local DBSCAN algorithms in parallel, 3) merging the data partitions based on the centroid. Compared with the traditional DBSCAN algorithm, the experimental result shows the proposed S_DBSCAN algorithm provides better operating efficiency and scalability.

机译：随着数据的爆炸性增长，我们已经进入了大数据时代。为了筛选大量信息，正在实现许多使用并行化的数据挖掘算法。聚类分析在数据挖掘中占有举足轻重的地位，而DBSCAN算法是聚类中使用最广泛的算法之一。但是，当现有的并行DBSCAN算法创建数据分区时，通常会将原始数据库划分为几个不相交的分区，随着数据维数的增加，高维空间的拆分和合并将耗费大量时间。为了解决该问题，本文提出了一种基于Spark的并行DBSCAN算法（S_DBSCAN），该算法可以快速实现原始数据的划分和聚类结果的组合。它分为以下步骤：1）基于随机样本对原始数据进行分区； 2）并行计算本地DBSCAN算法； 3）基于质心合并数据分区。与传统的DBSCAN算法相比，实验结果表明所提出的S_DBSCAN算法具有更好的运行效率和可扩展性。

著录项

来源
《2016 IEEE International Conferences on Big Data and Cloud Computing, Social Computing and Networking, Sustainable Computing and Communication》|2016年|548-553|共6页
会议地点 Atlanta(US)
作者
Guangchun Luo; Xiaoyu Luo; Thomas Fairley Gooch; Ling Tian; Ke Qin;
展开▼
作者单位

Sch. of Comput. Sci. Eng., Univ. of Electron. Sci. Technol. of China, Chengdu, China;

Sch. of Comput. Sci. Eng., Univ. of Electron. Sci. Technol. of China, Chengdu, China;

Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA, USA;

Sch. of Comput. Sci. Eng., Univ. of Electron. Sci. Technol. of China, Chengdu, China;

Sch. of Comput. Sci. Eng., Univ. of Electron. Sci. Technol. of China, Chengdu, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Clustering algorithms; Sparks; Algorithm design and analysis; Partitioning algorithms; Data mining; Machine learning algorithms; Merging;

机译：聚类算法;火花;算法设计与分析;分区算法;数据挖掘;机器学习算法;合并;

相似文献

外文文献
中文文献
专利

1. DBSCAN-PSM: an improvement method of DBSCAN algorithm on Spark [J] . Guangsheng Chen, Yiqun Cheng, Weipeng Jing International Journal of High Performance Computing and Networking . 2019,第4期

机译：DBSCAN-PSM：火花上DBSCAN算法的改进方法
2. K-DBSCAN: an efficient density-based clustering algorithm supports parallel computing [J] . Chao Deng, Jinwei Song, Saihua Cai, International Journal of Simulation & Process Modelling . 2018,第5期

机译：K-DBSCAN：基于有效的基于密度的聚类算法支持并行计算
3. MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data [J] . Yaobin HE, Haoyu TAN, Wuman LUO, Frontiers of computer science in China . 2014,第1期

机译：MR-DBSCAN：基于MapReduce的可扩展DBSCAN算法，用于处理严重偏斜的数据
4. A Parallel DBSCAN Algorithm Based on Spark [C] . Guangchun Luo, Xiaoyu Luo, Thomas Fairley Gooch, IEEE International Conference on Big Data and Cloud Computing . 2016

机译：基于火花的平行DBSCAN算法
5. Agent Based Parallelization of Computational Geometry Algorithms [D] . ?Gokulramkumar, Saranya 2020

机译：基于代理的计算几何算法的并行化
6. Spark-Based Parallel Genetic Algorithm for Simulating a Solution of Optimal Deployment of an Underwater Sensor Network [O] . Peng Liu, Shuai Ye, Can Wang, 2019

机译：基于火花的并行遗传算法用于仿真水下传感器网络的最优部署方案
7. Research on the Parallelization of the DBSCAN Clustering Algorithm for Spatial Data Mining Based on the Spark Platform [O] . Fang Huang, Qiang Zhu, Ji Zhou, 2017

机译：基于spark平台的DBsCaN空间数据挖掘聚类算法并行化研究

A Parallel DBSCAN Algorithm Based on Spark

摘要

著录项

相似文献

相关主题

期刊订阅