...
首页> 外文期刊>International Journal of High Performance Computing and Networking >DBSCAN-PSM: an improvement method of DBSCAN algorithm on Spark
【24h】

DBSCAN-PSM: an improvement method of DBSCAN algorithm on Spark

机译:DBSCAN-PSM:火花上DBSCAN算法的改进方法

获取原文
获取原文并翻译 | 示例
           

摘要

DBSCAN is a density-based data clustering algorithm; in image processing, data mining, machine learning and other fields are widely used. With the increasing of the size of clusters, the parallel DBSCAN algorithm is widely used. However, we consider current partitioning method of DBSCAN is too simple and steps of GETNEIGHBORS query repeatedly access the dataset on Spark. So we proposed DBSCAN-PSM which applies new data partitioning and merging method. In the first stage of our method, we import the KD-tree, combine the partitioning and GETNEIGHBORS query, reduce the number of access to the dataset and decrease the influence of I/O in the algorithm. In the second stage of our method, we use the feature of points in merging so as to avoid the time costing of the global label. Experimental results showed that our new method can improve the parallel efficiency and the clustering algorithm performance.
机译:DBSCAN是一种基于密度的数据聚类算法; 在图像处理中,广泛使用数据挖掘,机器学习和其他字段。 随着簇大小的增加,并行DBSCAN算法被广泛使用。 但是,我们考虑当前DBSCAN的分区方法太简单,GetNeighbors查询的步骤反复访问火花上的数据集。 因此,我们提出了应用新数据分区和合并方法的DBSCAN-PSM。 在我们的方法的第一阶段,我们导入KD-Tree,将分区和GetNeighbors查询组合,减少对数据集的访问次数,并降低I / O在算法中的影响。 在我们方法的第二阶段,我们使用合并中的点的特征,以避免全局标签的时间成本。 实验结果表明,我们的新方法可以提高并行效率和聚类算法性能。

著录项

  • 来源
  • 作者单位

    Heilongjiang Province Engineering Technology Research Center for Forestry Ecological Big Data Storage and High Performance (Cloud) Computing College of Information and Computer Engineering Northeast Forestry University;

    Heilongjiang Province Engineering Technology Research Center for Forestry Ecological Big Data Storage and High Performance (Cloud) Computing College of Information and Computer Engineering Northeast Forestry University;

    Heilongjiang Province Engineering Technology Research Center for Forestry Ecological Big Data Storage and High Performance (Cloud) Computing College of Information and Computer Engineering Northeast Forestry University;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 计算技术、计算机技术;
  • 关键词

    Big data; DBSCAN; Data partitioning; Data merging;

    机译:大数据;DBSCAN;数据分区;数据合并;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号