Hashing-Based Approximate DBSCAN

机译：基于散列的近似DBSCAN

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Analyzing massive amounts of data and extracting value from it has become key across different disciplines. As the amounts of data grow rapidly, however, current approaches for data analysis struggle. This is particularly true for clustering algorithms where distance calculations between pairs of points dominate overall time. Crucial to the data analysis and clustering process, however, is that it is rarely straightforward. Instead, parameters need to be determined through several iterations. Entirely accurate results are thus rarely needed and instead we can sacrifice precision of the final result to accelerate the computation. In this paper we develop ADvaNCE, a new approach to approximating DBSCAN. ADvaNCE uses two measures to reduce distance calculation overhead: (1) locality sensitive hashing to approximate and speed up distance calculations and (2) representative point selection to reduce the number of distance calculations. Our experiments show that our approach is in general one order of magnitude faster (at most 30x in our experiments) than the state of the art.

机译：分析大量数据并从中提取价值已成为不同学科的关键。但是，随着数据量的快速增长，当前的数据分析方法正处于困境。对于聚类算法尤其如此，在聚类算法中，成对的点之间的距离计算支配着整个时间。但是，对于数据分析和聚类过程至关重要的是，它很少是简单明了的。相反，需要通过多次迭代来确定参数。因此，几乎不需要完全准确的结果，相反，我们可以牺牲最终结果的精度来加快计算速度。在本文中，我们开发了ADvaNCE，这是一种近似DBSCAN的新方法。 ADvaNCE使用两种方法来减少距离计算的开销：（1）局部敏感的哈希值可以近似并加快距离计算;（2）代表性点的选择可以减少距离计算的次数。我们的实验表明，我们的方法通常比现有技术快一个数量级（在我们的实验中最多30倍）。

著录项

来源
《East European conference on advances in databases and information systems》|2016年|31-45|共15页
会议地点
作者
Tianrun Li; Thomas Heinis; Wayne Luk;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities [J] . Kim Jeong-Hun, Choi Jong-Hyeok, Yoo Kwan-Hee, Journal of supercomputing . 2019,第1期

机译：AA-DBSCAN：一种近似的自适应DBSCAN，用于查找具有不同密度的聚类
2. On Hashing-Based Approaches to Approximate DNF-Counting [J] . Kuldeep S. Meel, Aditya A. Shrotri, Moshe Y. Vardi LIPIcs : Leibniz International Proceedings in Informatics . 2018,第23期

机译：基于散列的近似DNF计数方法
3. DBScan and WrapDBScan methods applying for intellectual variance analysis in employee’s moving [J] . P.A. Savenkov, A.N. Ivutin Procedia Computer Science . 2021,第a期

机译：DBSCAN和WRAPDBSCAN方法在员工移动中申请智力方差分析
4. Hashing-Based Approximate DBSCAN [C] . Tianrun Li, Thomas Heinis, Wayne Luk East European Conference on Advances in Databases and Information Systems . 2016

机译：基于哈希的近似DBSCAN
5. A generic attack on hashing-based software tamper resistance. [D] . Wurster, Glenn. 2005

机译：对基于哈希的软件防篡改的一般攻击。
6. An Active Learning Method Based on Variational Autoencoder and DBSCAN Clustering [O] . Fang Chen, Tao Zhang, Ruilin Liu 2021

机译：基于变化性AutiaceCoder和DBSCAN群集的主动学习方法
7. Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors [O] . Andoni, Alexandr, Laarhoven, Thijs, Razenshteyn, Ilya, 2017

机译：近似的基于哈希的最优时空权衡邻居

Hashing-Based Approximate DBSCAN

摘要

著录项

相似文献

相关主题

期刊订阅