【24h】

DBSCAN on Resilient Distributed Datasets

机译:DBSCAN在弹性分布式数据集上

获取原文

摘要

DBSCAN is a well-known density-based data clustering algorithm that is widely used due to its ability to find arbitrarily shaped clusters in noisy data. However, DBSCAN is hard to scale which limits its utility when working with large data sets. Resilient Distributed Datasets (RDDs), on the other hand, are a fast data-processing abstraction created explicitly for in-memory computation of large data sets. This paper presents a new algorithm based on DBSCAN using the Resilient Distributed Datasets approach: RDD-DBSCAN. RDD-DBSCAN overcomes the scalability limitations of the traditional DBSCAN algorithm by operating in a fully distributed fashion. The paper also evaluates an implementation of RDD-DBSCAN using Apache Spark, the official RDD implementation.
机译:DBSCAN是一种以众所周知的基于密度的数据聚类算法,由于其在嘈杂数据中找到任意形状的群集而被广泛使用。但是,DBSCAN难以扩展,在使用大数据集时限制其实用程序。另一方面,弹性分布式数据集(RDDS)是明确地创建的快速数据处理抽象,用于大数据集的内存计算。本文介绍了一种基于DBSCAN的新算法,使用弹性分布式数据集方法:RDD-DBSCAN。 RDD-DBSCAN通过以完全分布式的方式运行,克服了传统DBSCAN算法的可扩展性限制。本文还评估了使用Apache Spark,官方RDD实施的RDD-DBSCAN的实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号