【24h】

Adaptive Random Forests for Evolving Data Stream Classification

机译:自适应随机森林对数据流分类的发展

获取原文

摘要

Random forests is currently one of the most used machine learning algorithms in the non-streaming (batch) setting. This preference is attributable to its high learning performance and low demands with respect to input preparation and hyper-parameter tuning. However, in the challenging context of evolving data streams, there is no random forests algorithm that can be considered state-of-the-art in comparison to bagging and boosting based algorithms. In this work, we present the adaptive random forest (ARF) algorithm for classification of evolving data streams. In contrast to previous attempts of replicating random forests for data stream learning, ARF includes an effective resampling method and adaptive operators that can cope with different types of concept drifts without complex optimizations for different data sets. We present experiments with a parallel implementation of ARF which has no degradation in terms of classification performance in comparison to a serial implementation, since trees and adaptive operators are independent from one another. Finally, we compare ARF with state-of-the-art algorithms in a traditional test-then-train evaluation and a novel delayed labelling evaluation, and show that ARF is accurate and uses a feasible amount of resources.
机译:当前,在非流(批处理)设置中,随机森林是最常用的机器学习算法之一。这种偏好归因于其较高的学习性能和对输入准备和超参数调整的低要求。但是,在不断发展的数据流具有挑战性的情况下,与基于装袋和增强的算法相比,没有随机森林算法可以被视为最新技术。在这项工作中,我们提出了自适应随机森林(ARF)算法,用于对不断发展的数据流进行分类。与以前为数据流学习复制随机森林的尝试相反,ARF包括有效的重采样方法和自适应运算符,它们可以应对不同类型的概念漂移,而无需针对不同数据集进行复杂的优化。我们介绍了使用ARF的并行实现的实验,与串行实现相比,该算法在分类性能方面没有任何下降,因为树和自适应运算符是彼此独立的。最后,我们将ARF与最先进的算法在传统的“先试后训练”评估和新型延迟标签评估中进行了比较,证明ARF是准确的并且使用了可行的资源量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号