【24h】

Hashing-Based Approximate DBSCAN

机译:基于哈希的近似DBSCAN

获取原文

摘要

Analyzing massive amounts of data and extracting value from it has become key across different disciplines. As the amounts of data grow rapidly, however, current approaches for data analysis struggle. This is particularly true for clustering algorithms where distance calculations between pairs of points dominate overall time. Crucial to the data analysis and clustering process, however, is that it is rarely straightforward. Instead, parameters need to be determined through several iterations. Entirely accurate results are thus rarely needed and instead we can sacrifice precision of the final result to accelerate the computation. In this paper we develop ADvaNCE, a new approach to approximating DBSCAN. ADvaNCE uses two measures to reduce distance calculation overhead: (1) locality sensitive hashing to approximate and speed up distance calculations and (2) representative point selection to reduce the number of distance calculations. Our experiments show that our approach is in general one order of magnitude faster (at most 30× in our experiments) than the state of the art.
机译:分析大量数据和从中提取价值已成为不同学科的关键。然而,随着数据量快速增长,数据分析斗争的当前方法。对于聚类算法尤其如此,其中距离点对占主导地位的距离计算。然而,对数据分析和聚类过程至关重要的是,它很少是直截了当的。相反,需要通过几个迭代来确定参数。因此,很少需要完全准确的结果,而是可以牺牲最终结果的精确度以加速计算。在本文中,我们提前提前,一种近似DBSCAN的新方法。前进使用两种措施来减少距离计算开销:(1)临时敏感散列近似和加速距离计算和(2)代表点选择,以减少距离计算的数量。我们的实验表明,我们的方法一般一般一级(我们的实验中最多30倍)比现有技术更快。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号