首页> 外文期刊>Knowledge-Based Systems >SDCOR: Scalable density-based clustering for local outlier detection in massive-scale datasets
【24h】

SDCOR: Scalable density-based clustering for local outlier detection in massive-scale datasets

机译:SDCOR:基于尺寸的基于密度的基于密度的聚类,用于大规模数据集中的本地异常检测

获取原文
获取原文并翻译 | 示例
       

摘要

This paper presents a batch-wise density-based clustering approach for local outlier detection in massive-scale datasets. Unlike the well-known traditional algorithms, which assume that all the data is memory-resident, our proposed method is scalable and processes the input data chunk-by-chunk within the confines of a limited memory buffer. A temporary clustering model is built at the first phase; then, it is gradually updated by analyzing consecutive memory loads of points. Subsequently, at the end of scalable clustering, the approximate structure of the original clusters is obtained. Finally, by another scan of the entire dataset and using a suitable criterion, an outlying score is assigned to each object called SDCOR (Scalable Density-based Clustering Outlierness Ratio). Evaluations on real-life and synthetic datasets demonstrate that the proposed method has a low linear time complexity and is more effective and efficient compared to best-known conventional density-based methods, which need to load all data into the memory; and also, to some fast distance-based methods, which can perform on data resident in the disk. (C) 2021 Elsevier B.V. All rights reserved.
机译:本文介绍了一种基于批处理的基于密度的聚类方法,用于在大规模数据集中的本地异常检测。与众所周知的传统算法不同,这假设所有数据是内存居民,我们所提出的方法是可伸缩的,并在有限内存缓冲区的范围内处理输入数据逐块。临时聚类模型是在第一阶段建立的;然后,通过分析点的连续内存负荷来逐步更新。随后,在可伸缩聚类结束时,获得原始集群的近似结构。最后,通过整个数据集的另一个扫描并使用合适的标准,将偏远的分数分配给名为SDCOR的每个对象(可伸缩的基于密度基聚类比率比)。实际和合成数据集的评估表明,与最着名的传统密度的方法相比,所提出的方法具有低线性时间复杂性,更有效和高效,需要将所有数据加载到存储器中;此外,对于某种快速的基于距离的方法,可以在驻留在磁盘中的数据上执行。 (c)2021 elestvier b.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号