首页> 外文期刊>Concurrency and computation: practice and experience >A framework for scalable real-time anomaly detection over voluminous, geospatial data streams
【24h】

A framework for scalable real-time anomaly detection over voluminous, geospatial data streams

机译:用于对大量地理空间数据流进行可伸缩实时异常检测的框架

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

This study presents a framework to enable distributed detection, storage, and analysis of anomalies in voluminous data streams. Individual observations within these streams are multidimensional, with each dimension corresponding to a feature of interest. We consider time-series geospatial datasets generated by remote and in situ observational devices. Three aspects make this problem particularly challenging: (1) the cumulative volume and rates of data arrivals, (2) evolution of the datasets over time, and (3) spatiotemporal correlations associated with the data. Further, solutions must minimize user intervention and be amenable to distributed processingto ensure scalability Our approach achieves accurate, high-throughput classifications in real time, which we demonstrate with our reference anomaly detector implementations. We also provide interfaces that allow new implementations to be developed and parallelized automatically, ensuring applicability across problem domains. To help quantify the magnitude of anomalous observations, detector implementations provide a corresponding degree of irregularity We have incorporated these algorithms into our distributed storage platform, Galileo, and profiled their suitability through empirical analysis that demonstrates high throughput (10 000 observations per-second, per-node) on a real-worl Petabyte dataset.
机译:这项研究提出了一个框架,可以对大量数据流中的异常进行分布式检测,存储和分析。这些流中的各个观测值是多维的,每个维对应于感兴趣的特征。我们考虑由远程和原位观测设备生成的时间序列地理空间数据集。三个方面使这个问题特别具有挑战性:(1)数据到达的累积量和速率;(2)数据集随时间的演变;(3)与数据相关的时空相关性。此外,解决方案必须最大程度地减少用户干预,并应进行分布式处理,以确保可扩展性。我们的方法可实时实现准确,高吞吐量的分类,并通过参考异常检测器实现进行了演示。我们还提供了允许自动开发和并行化新实现的接口,从而确保了跨问题域的适用性。为了帮助量化异常观测值的数量,探测器的实现提供了相应程度的不规则性。我们已将这些算法整合到我们的分布式存储平台Galileo中,并通过经验分析证明了它们的适用性,该分析表明了高吞吐量(每秒1万次观测,每秒-节点)在真实世界中的PB数据集上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号