首页> 外文期刊>電子情報通信学会技術研究報告 >Data Stream Processing Research at IMC of East China Normal University
【24h】

Data Stream Processing Research at IMC of East China Normal University

机译:华东师范大学IMC数据流处理研究

获取原文
获取原文并翻译 | 示例
       

摘要

Data stream processing has been attracting more and more attention in research and industry communities due to its broad potential applications. In this talk, we would like to introduce briefly the research work which have been done in our group. Our research interests on data streams are frequent item(set)s mining, clustering, and burst detection over data streams. Some work on practical application and some consideration on future work will be introduced as well.For the basic problem of mining frequent items over data streams, an algorithm, called hCount is proposed. It is of low space complexity, low per-tuple processing cost, and high recall and precision. Then, for mining of the frequent itemsets, we develop a new false-negative frequent itemset mining algorithm which can get a condensed representation of frequent itemsets in transactional data streams by discovering a false negative collection of some special itemsets that covers frequent itemsets with high probability with respect to set inclusion relationship among itemsets.Our research on data stream mining was focusing on clustering of data streams. SWClustering is the algorithm we proposed to cluster data streams over sliding windows, and EHCF (Exponential Histogram of Cluster Features) is the synopsis to maintain the statistic information of clusters in sliding windows. With SWClustering, not only the changing distribution of clusters but also the evolving behaviors of individual clusters could be captured. CluDistream is for clustering distributed data streams, which can effectively handle a huge volume of data with noisy, corrupted or incomplete data records generated in distributed enviornment. In CluDistream, the EM-based (Expectation Maximization) algorithms, each data record is assigned to a cluster with certain degree of membership.The other important piece of work is on burst detection or monitoring over data streams. The fractal analysis method is adapted to enable the monitoring of both monotonic and non-monotonic aggregates on time changing data stream. The monotony property of aggregate monitoring is revealed and monotonic search space is built to decrease the time overhead for detecting bursts from O(m) to O(log m), where m is the number of windows to be monitored. With the help of a novel piecewise fractal model, the statistical summary is compressed to be fit in limited main memory, so that high aggregates on windows of any length can be detected accurately and efficiently on-line.A practical data stream processing system for telecommunication network flow data analysis will be also introduced in this talk.
机译:数据流处理由于其广泛的潜在应用而在研究和行业界引起了越来越多的关注。在本次演讲中,我们想简单介绍一下我们小组所做的研究工作。我们对数据流的研究兴趣是对数据流的频繁项集挖掘,聚类和突发检测。针对实际应用中的一些工作,以及对未来工作的一些考虑。针对挖掘数据流中频繁项的基本问题,提出了一种称为hCount的算法。它具有较低的空间复杂度,较低的每组处理成本以及较高的查全率和精度。然后,为了挖掘频繁项集,我们开发了一种新的假阴性频繁项集挖掘算法,该算法可以通过发现某些特殊项集的假阴性集合来掩盖交易项数据流中频繁项集的浓缩表示,这些特殊项集极有可能覆盖频繁项集我们在数据流挖掘方面的研究集中在数据流的聚类上。 SWClustering是我们提出的在滑动窗口上对数据流进行聚类的算法,EHCF(聚类特征的指数直方图)是在滑动窗口中维护聚类统计信息的概要。使用SWClustering,不仅可以捕获群集的变化分布,而且可以捕获单个群集的演化行为。 CluDistream用于群集分布式数据流,它可以有效处理大量数据,其中包含在分布式环境中生成的嘈杂,损坏或不完整的数据记录。在基于EM的(期望最大化)算法CluDistream中,每个数据记录都分配给具有一定隶属度的集群。另一项重要的工作是突发检测或监视数据流。分形分析方法适用于在时变数据流上监视单调和非单调聚合。揭示了聚合监视的单调性,并构建了单调搜索空间以减少用于检测从O(m)到O(log m)的突发的时间开销,其中m是要监视的窗口数。借助新颖的分段分形模型,统计摘要被压缩以适合有限的主内存,从而可以在线上准确,高效地检测到任何长度的窗口上的高聚集量。一种实用的电信数据流处理系统本讲座还将介绍网络流量数据分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号