首页> 外文期刊>Knowledge and Information Systems >Disk aware discord discovery: finding unusual time series in terabyte sized datasets
【24h】

Disk aware discord discovery: finding unusual time series in terabyte sized datasets

机译:磁盘感知不和谐发现:在TB级数据集中发现异常时间序列

获取原文
获取原文并翻译 | 示例
       

摘要

The problem of finding unusual time series has recently attracted much attention, and several promising methods are now in the literature. However, virtually all proposed methods assume that the data reside in main memory. For many real-world problems this is not be the case. For example, in astronomy, multi-terabyte time series datasets are the norm. Most current algorithms faced with data which cannot fit in main memory resort to multiple scans of the disk /tape and are thus intractable. In this work we show how one particular definition of unusual time series, the time series discord, can be discovered with a disk aware algorithm. The proposed algorithm is exact and requires only two linear scans of the disk with a tiny buffer of main memory. Furthermore, it is very simple to implement. We use the algorithm to provide further evidence of the effectiveness of the discord definition in areas as diverse as astronomy, web query mining, video surveillance, etc., and show the efficiency of our method on datasets which are many orders of magnitude larger than anything else attempted in the literature.
机译:寻找不寻常的时间序列的问题最近引起了很多关注,并且文献中现在有几种有前途的方法。但是,实际上所有提出的方法都假定数据驻留在主存储器中。对于许多实际问题,情况并非如此。例如,在天文学中,数TB的时间序列数据集是常态。当前大多数面对无法容纳在主存储器中的数据的算法都会对磁盘/磁带进行多次扫描,因此难以处理。在这项工作中,我们展示了如何使用磁盘感知算法发现异常时间序列的一种特殊定义,即时间序列不一致。所提出的算法是精确的,只需要对磁盘进行两次线性扫描,并带有一个很小的主内存缓冲区。此外,它非常容易实现。我们使用该算法为不和谐定义在天文学,网络查询挖掘,视频监视等领域的有效性提供了进一步的证据,并证明了我们的方法在比任何事物大许多数量级的数据集上的有效性其他尝试在文学中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号