首页> 外文期刊>Concurrency and computation: practice and experience >Large dataset summarization with automatic parameter optimization and parallel processing for local outlier detection
【24h】

Large dataset summarization with automatic parameter optimization and parallel processing for local outlier detection

机译:大型数据集汇总,具有自动参数优化和并行处理功能,可进行局部异常值检测

获取原文
获取原文并翻译 | 示例

摘要

As one of themost important research problems of data analytics and data mining, outlier detectionfrom large datasets has drawn many research attentions in recent years. In this paper, weinvestigate the interesting research problem of summarizing large datasets for supporting efficientlocal outlier detection. To summarize large datasets, efficient summarization algorithms areproposed that produce a highly compact summary of the original dataset, which can be appliedto detect local outliers from future similar datasets. A novel automatic parameter optimizationmethodis proposed toproduce theoptimal setup of thekey parametersused in the summarizationalgorithm. Parallel processing methods are also proposed to accelerate significantly the summarizationprocess. The experimental evaluation results demonstrate that our proposed algorithmsare highly scalable for large datasets and effective in producing dataset summary for local outlierdetection.
机译:作为数据分析和数据挖掘最重要的研究问题之一,来自大型数据集的异常值检测 r n近年来引起了许多研究关注。在本文中,我们 n n研究了汇总大型数据集以支持有效的 r n局部离群值检测的有趣研究问题。为了汇总大型数据集,提出了一种有效的汇总算法,该算法可以生成原始数据集的高度紧凑的摘要,可以将其应用于检测未来相似数据集中的局部离群值。提出了一种新颖的自动参数优化 r nmethodis,以产生用于总结 r nalgorithm中的关键参数的最佳设置。还提出了并行处理方法以显着加快汇总过程。实验评估结果表明,我们提出的算法对于大型数据集具有高度的可伸缩性,并且可以有效地生成数据集摘要以进行局部离群值检测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号