首页> 外文期刊>Knowledge-Based Systems >Evolving large-scale data stream analytics based on scalable PANFIS
【24h】

Evolving large-scale data stream analytics based on scalable PANFIS

机译:基于可扩展PANFIS的不断发展的大规模数据流分析

获取原文
获取原文并翻译 | 示例

摘要

The main challenge in large-scale data stream analytics lies in the ability of machine learning to generate large-scale data knowledge in reasonable timeframe without suffering from a loss of accuracy. Many distributed machine learning frameworks have recently been built to speed up the large-scale data learning process. However, most distributed machine learning used in these frameworks still uses an offline algorithm model which cannot cope with the data stream problems. In fact, large-scale data are mostly generated by the non-stationary data stream where its pattern evolves over time. To address this problem, we propose a novel Evolving Large-scale Data Stream Analytics framework based on a Scalable Parsimonious Network based on Fuzzy Inference System (Scalable PANFIS), where the PANFIS evolving algorithm is distributed over the worker nodes in the cloud to learn large-scale data stream. Scalable PANFIS framework incorporates the active learning (AL) strategy and two model fusion methods. The AL accelerates the distributed learning process to generate an initial evolving large-scale data stream model (initial model), whereas the two model fusion methods aggregate an initial model to generate the final model. The final model represents the update of current large-scale data knowledge which can be used to infer future data. Extensive experiments on this framework are validated by measuring the accuracy and running time of four combinations of Scalable PANFIS and other Spark-based built in algorithms. The results indicate that Scalable PANFIS with AL improves the training time to be almost two times faster than Scalable PANFIS without AL. The results also show both rule merging and the voting mechanisms yield similar accuracy in general among Scalable PANFIS algorithms and they are generally better than Spark based algorithms. In terms of running time, the Scalable PANFIS training time outperforms all Spark-based algorithms when classifying a multi-class label dataset. (C) 2019 Elsevier B.V. All rights reserved.
机译:大规模数据流分析的主要挑战在于机器学习在合理的时间范围内生成大规模数据知识而又不损失准确性的能力。最近已经建立了许多分布式机器学习框架,以加速大规模数据学习过程。但是,这些框架中使用的大多数分布式机器学习仍然使用离线算法模型,该模型无法解决数据流问题。实际上,大规模数据主要由非平稳数据流生成,在非平稳数据流中,其模式会随着时间而演变。为了解决这个问题,我们提出了一种基于可扩展简约网络的可扩展大规模数据流分析框架,该框架基于模糊推理系统(Scalable PANFIS),其中,可扩展PANFIS演化算法分布在云中的工作节点上,以学习大型数据。规模的数据流。可扩展的PANFIS框架结合了主动学习(AL)策略和两种模型融合方法。 AL加速了分布式学习过程,以生成初始的不断发展的大规模数据流模型(初始模型),而这两种模型融合方法聚合了初始模型以生成最终模型。最终模型代表了当前可用于推断未来数据的大规模数据知识的更新。通过测量可伸缩PANFIS和其他基于Spark的内置算法的四种组合的准确性和运行时间,可以验证此框架上的大量实验。结果表明,具有AL的可伸缩PANFIS的训练时间比不具有AL的可伸缩PANFIS的训练时间快将近两倍。结果还表明,在可伸缩PANFIS算法中,规则合并和表决机制通常产生相似的准确性,并且它们通常比基于Spark的算法更好。在运行时间方面,对多类标签数据集进行分类时,可伸缩PANFIS训练时间优于所有基于Spark的算法。 (C)2019 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号