首页> 外文期刊>Pattern Analysis and Applications >BISDBx: towards batch-incremental clustering for dynamic datasets using SNN-DBSCAN
【24h】

BISDBx: towards batch-incremental clustering for dynamic datasets using SNN-DBSCAN

机译:BISDBx:使用SNN-DBSCAN对动态数据集进行批量增量聚类

获取原文
获取原文并翻译 | 示例

摘要

Many important applications such as recommender systems, e-commerce sites, web crawlers involve dynamic datasets. Dynamic datasets undergo frequent changes in the form of insertion or deletion of data that affects its size. A naive algorithm may not process these frequent changes efficiently as it involves the entire set of data points each time a change is inflicted. Fast incremental algorithms process these updates to datasets efficiently to avoid redundant computation. In this article, we propose incremental extensions to shared nearest neighbor density-based clustering (SNNDB) algorithm for both addition and deletion of data points. Existing incremental extension to SNNDB viz. InSDB cannot handle deletion and handles insertions one point at a time. Our method overcomes both these bottlenecks by efficiently identifying affected parts of clusters while processing updates to dataset in batch mode. We propose three incremental variants of SNNDB in batch mode for both addition and deletion with the third variant being the most effective. Experimental observations on real world and synthetic datasets showed that our algorithms are up to 4 orders of magnitude faster than the naive SNNDB algorithm and about 2 orders of magnitude faster than the pointwise incremental method.
机译:推荐系统,电子商务站点,Web爬网程序等许多重要应用程序都涉及动态数据集。动态数据集经常以影响其大小的数据插入或删除的形式发生变化。天真的算法可能无法有效地处理这些频繁的更改,因为每次更改都会涉及整个数据点集。快速增量算法可有效地对数据集进行这些更新,从而避免冗余计算。在本文中,我们建议对共享最近邻基于密度的聚类(SNNDB)算法进行增量扩展,以增加和删除数据点。 SNNDB的现有增量扩展。 InSDB无法处理删除,只能一次处理一个插入。我们的方法通过有效地识别群集的受影响部分,同时以批处理模式处理对数据集的更新,从而克服了这两个瓶颈。我们建议以批处理方式添加和删除SNNDB的三个增量变体,其中第三个变体最有效。在现实世界和合成数据集上的实验观察表明,我们的算法比朴素的SNNDB算法快4个数量级,比点向增量方法快2个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号