首页> 外文期刊>Pattern recognition letters >Strict Very Fast Decision Tree: A memory conservative algorithm for data stream mining
【24h】

Strict Very Fast Decision Tree: A memory conservative algorithm for data stream mining

机译:严格的超快速决策树:一种用于数据流挖掘的内存保守算法

获取原文
获取原文并翻译 | 示例

摘要

Dealing with memory and time constraints are current challenges when learning from data streams with a massive amount of data. Many algorithms have been proposed to handle these difficulties, among them, the Very Fast Decision Tree (VFDT) algorithm. Although the VFDT has been widely used in data stream mining, in the last years, several authors have suggested modifications to increase its performance, putting aside memory concerns by proposing memory-costly solutions. Besides, most data stream mining solutions have been centred around ensembles, which combine the memory costs of their weak learners, usually VFDTs. To reduce memory costs, keeping predictive performance, this study proposes the Strict VFDT (SVFDT), a novel algorithm based on the VFDT. The SVFDT algorithm minimises unnecessary tree growth, substantially reducing memory usage and keeping competitive predictive performance. Moreover, since it creates much more shallow trees than VFDT, the SVFDT can achieve a shorter processing time. Experiments were carried out comparing the SVFDT with the VFDT in 11 benchmark data stream datasets. This comparison assessed the trade-off between accuracy, memory, and processing time. Statistical analysis showed that the proposed algorithm obtained similar predictive performance and significantly reduced processing time and memory use. Thus, SVFDT is a suitable option for data stream mining with memory and time limitations, recommended as a weak learner in ensemble-based solutions. (C) 2018 Elsevier B.V. All rights reserved.
机译:当从具有大量数据的数据流中学习时,处理内存和时间限制是当前的挑战。已经提出了许多算法来解决这些困难,其中包括非常快速决策树(VFDT)算法。尽管VFDT已被广泛用于数据流挖掘中,但是在最近几年中,一些作者提出了一些修改建议以提高其性能,并通过提出内存代价高昂的解决方案来消除内存问题。此外,大多数数据流挖掘解决方案都以集成为中心,这些集成结合了他们的弱学习者(通常是VFDT)的存储成本。为了降低内存成本,保持预测性能,本研究提出了严格的VFDT(SVFDT),这是一种基于VFDT的新颖算法。 SVFDT算法最大程度地减少了不必要的树增长,从而大大减少了内存使用,并保持了竞争性的预测性能。而且,由于它创建的树要比VFDT多得多,因此SVFDT可以缩短处理时间。在11个基准数据流数据集中比较了SVFDT和VFDT,进行了实验。该比较评估了准确性,内存和处理时间之间的权衡。统计分析表明,提出的算法获得了相似的预测性能,并显着减少了处理时间和内存使用量。因此,SVFDT是具有内存和时间限制的数据流挖掘的合适选择,建议在基于集成的解决方案中作为弱学习者。 (C)2018 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号