【24h】

Hashing-Based Approximate DBSCAN

机译:基于散列的近似DBSCAN

获取原文

摘要

Analyzing massive amounts of data and extracting value from it has become key across different disciplines. As the amounts of data grow rapidly, however, current approaches for data analysis struggle. This is particularly true for clustering algorithms where distance calculations between pairs of points dominate overall time. Crucial to the data analysis and clustering process, however, is that it is rarely straightforward. Instead, parameters need to be determined through several iterations. Entirely accurate results are thus rarely needed and instead we can sacrifice precision of the final result to accelerate the computation. In this paper we develop ADvaNCE, a new approach to approximating DBSCAN. ADvaNCE uses two measures to reduce distance calculation overhead: (1) locality sensitive hashing to approximate and speed up distance calculations and (2) representative point selection to reduce the number of distance calculations. Our experiments show that our approach is in general one order of magnitude faster (at most 30x in our experiments) than the state of the art.
机译:分析大量数据并从中提取价值已成为不同学科的关键。但是,随着数据量的快速增长,当前的数据分析方法正处于困境。对于聚类算法尤其如此,在聚类算法中,成对的点之间的距离计算支配着整个时间。但是,对于数据分析和聚类过程至关重要的是,它很少是简单明了的。相反,需要通过多次迭代来确定参数。因此,几乎不需要完全准确的结果,相反,我们可以牺牲最终结果的精度来加快计算速度。在本文中,我们开发了ADvaNCE,这是一种近似DBSCAN的新方法。 ADvaNCE使用两种方法来减少距离计算的开销:(1)局部敏感的哈希值可以近似并加快距离计算;(2)代表性点的选择可以减少距离计算的次数。我们的实验表明,我们的方法通常比现有技术快一个数量级(在我们的实验中最多30倍)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号