首页> 外文OA文献 >Scalable real-time classification of data streams with concept drift
【2h】

Scalable real-time classification of data streams with concept drift

机译:具有概念漂移的可扩展实时数据流分类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

Inducing adaptive predictive models in real-time from high throughput data streams is one of the most challenging areas of Big Data Analytics. The fact that data streams may contain concept drifts (changes of the pattern encoded in the stream over time) and are unbounded, imposes unique challenges in comparison with predictive data mining from batch data. Several real-time predictive data stream algorithms exist, however, most approaches are not naturally parallel and thus limited in their scalability. This paper highlights the Micro-Cluster Nearest Neighbour (MC-NN) data stream classifier. MC-NN is based on statistical summaries of the data stream and a nearest neighbour approach, which makes MC-NN naturally parallel. In its serial version MC-NN is able to handle data streams, the data does not need to reside in memory and is processed incrementally. MC-NN is also able to adapt to concept drifts. This paper provides an empirical study on the serial algorithm’s speed, adaptivity and accuracy. Furthermore, this paper discusses the new parallel implementation of MC-NN, its parallel properties and provides an empirical scalability study.
机译:从高吞吐量数据流中实时生成自适应预测模型是大数据分析最具挑战性的领域之一。与从批处理数据中进行预测数据挖掘相比,数据流可能包含概念漂移(随时间推移在流中编码的模式的变化)并且不受限制的事实,这带来了独特的挑战。存在几种实时预测数据流算法,但是,大多数方法并非自然并行,因此其可伸缩性受到限制。本文重点介绍了微集群最近邻居(MC-NN)数据流分类器。 MC-NN基于数据流的统计摘要和最近邻居方法,这使得MC-NN自然地是并行的。在其串行版本中,MC-NN能够处理数据流,数据不需要驻留在内存中,而是进行增量处理。 MC-NN也能够适应概念漂移。本文对串行算法的速度,适应性和准确性进行了实证研究。此外,本文讨论了MC-NN的新并行实现,其并行属性并提供了经验可扩展性研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号