【24h】

Distributed Discord Discovery: Spark Based Anomaly Detection in Time Series

机译:分布式Discord发现:时间序列中基于Spark的异常检测

获取原文
获取原文并翻译 | 示例

摘要

The computational complexity of discord discovery is O(m), where m is the size of time series. Many promising methods were proposed to resolve this compute-intensive problem. These methods sequentially discover discords on standalone machine. The limited capability of standalone machine in terms of computing and memory capacity hinders these methods in discovering discords from large dataset in reasonable time. In this work, we propose a distributed discord discovery method. Our method is able to combine discord results from different computing nodes, which are non-combinable in previous literature. We mitigate the issue of the memory wall by using distributed data partitioning. We implement our method on distributed Spark computing framework and distributed HDFS (Hadoop Distributed File System) storage platform. The implementation exhibits superior scalability and enables discords discovery in multi-dimension time series. We evaluate our method with terabyte-sized dataset, which is larger than any dataset in previous literature. Evaluation results show that our method has clear advantage in terms of performance and efficiency over state-of-the-art algorithms.
机译:不和谐发现的计算复杂度为O(m),其中m是时间序列的大小。提出了许多有前途的方法来解决此计算密集型问题。这些方法在独立计算机上顺序发现不和谐。独立计算机在计算和内存容量方面的有限能力阻碍了这些方法在合理的时间内从大型数据集中发现不一致的地方。在这项工作中,我们提出了一种分布式不和谐发现方法。我们的方法能够合并来自不同计算节点的不一致结果,这在以前的文献中是不可合并的。我们通过使用分布式数据分区来减轻内存墙的问题。我们在分布式Spark计算框架和分布式HDFS(Hadoop分布式文件系统)存储平台上实现我们的方法。该实现具有出色的可伸缩性,并且可以在多维时间序列中发现不和谐。我们使用TB级数据集评估我们的方法,该数据集比以前文献中的任何数据集都大。评估结果表明,相对于最新算法,我们的方法在性能和效率上具有明显的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号