首页> 外文期刊>Future generation computer systems >Mining the frequency of time-constrained serial episodes over massive data sequences and streams
【24h】

Mining the frequency of time-constrained serial episodes over massive data sequences and streams

机译:在大规模数据序列和流中挖掘时间约束串行剧集的频率

获取原文
获取原文并翻译 | 示例

摘要

With the popularity and development of the Internet, telecommunication, industrial systems etc., massive amounts of event sequences and streams have been and are being produced. These sequences and streams are generated at a fast pace posing grand challenges in computation and analysis. On one hand, due to the huge number of events, analyzing the sequences is time-consuming. On the other hand, as events in a stream may not necessarily arrive in uniform speed, an effective computational model over the stream should be able to accommodate the intensive arrival of events. In this work, we focus on frequency evaluation which is one representative task in sequence and stream analysis. To address the challenges listed above, we present a one-pass algorithm, namely ONCE, which outputs a popularly used frequency from a given sequence. Moreover, we also present a pair of advanced models, SparkONCE and StreamingONCE, respectively. Both of these approaches are built on ONCE. With a series of non-trivial strategies carefully designed towards Spark, SparkONCE and StreamingONCE exhibit superior performances with respect to ONCE. In particular, compared to ONCE, SparkONCE significantly improves the efficiency in massive sequences; StreamingONCE can effectively adapt to the uneven speed for the events in a stream. The experimental study on real-world and synthetic datasets demonstrate that the proposed approach can work well on massive sequences and streams.
机译:随着互联网,电信,工业系统等的普及和发展,已经生产了大量的事件序列和流。这些序列和流是在计算和分析中构成大挑战的快速增长。一方面,由于事件数量大,分析序列是耗时的。另一方面,由于流中的事件可能不一定以均匀速度到达,因此流的有效计算模型应该能够容纳事件的密集到达。在这项工作中,我们专注于频率评估,这是序列和流分析中的一个代表性任务。为了解决上面列出的挑战,我们呈现了一种单通算法,即一次,从给定序列输出普遍使用的频率。此外,我们还分别展示了一对先进的模型,狂热和流媒体。这两种方法都建立了一次。凭借一系列仔细设计的非琐碎策略,旨在朝着火花,狂热和流媒体展现出卓越的表现。特别是,与一次相比,夏季显着提高了大规模序列的效率; StreamingOnce可以有效地适应流中事件的不均匀速度。对现实世界和合成数据集的实验研究表明,所提出的方法可以在大规模序列和流中进行良好。

著录项

  • 来源
    《Future generation computer systems》 |2020年第9期|849-863|共15页
  • 作者单位

    School of Cyber Engineering Xidian University 710071 Xi'an China State Key Laboratory on Integrated Services Networks Xidian University China;

    School of Cyber Engineering Xidian University 710071 Xi'an China;

    School of Cyber Engineering Xidian University 710071 Xi'an China;

    Department of Computer Science and Engineering The Chinese University of Hong Kong China;

    School of Cyber Engineering Xidian University 710071 Xi'an China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Spark; Sequence mining; Serial episode; Frequency; Stream;

    机译:火花;序列挖掘;串行剧集;频率;溪流;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号