首页> 外文会议>International Conference on Data Engineering >DBSCOUT: A Density-based Method for Scalable Outlier Detection in Very Large Datasets
【24h】

DBSCOUT: A Density-based Method for Scalable Outlier Detection in Very Large Datasets

机译:DBSCOUT:基于密度的方法,用于在非常大的数据集中可扩展的异常检测方法

获取原文

摘要

Recent technological advancements have enabled generating and collecting huge amounts of data in a daily manner. This data is used for different purposes that may impact us on an unprecedented scale. Understanding the data, including detecting its outliers, is a critical step before utilizing it.Outlier detection has been studied well in the literature but the existing approaches fail to scale to these very large settings. In this paper, we propose DBSCOUT, an efficient exact algorithm for outlier detection with a linear complexity that can run in parallel over multiple independent machines, making it a fit for the settings with billions of tuples. Besides the theoretical analysis, our experiment results confirm orders of magnitude improvement over the existing work, proving the efficiency, scalability, and effectiveness of our approach.
机译:最近的技术进步使得能够以日常方式产生和收集大量数据。 此数据用于不同的目的,可能会对我们的前所未有的规模影响。 了解包括检测到其异常值的数据是在利用它之前的关键步骤。在文献中已经很好地研究了更好的检测,但现有方法无法扩展到这些非常大的设置。 在本文中,我们提出了DBSCOUT,一种有效的精确算法,用于具有线性复杂度的异常复杂性,可以通过多个独立机器并行运行,使其适用于数十亿元组。 除了理论分析外,我们的实验结果还确认了现有工作的数量级,证明了我们方法的效率,可扩展性和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号