首页> 外文期刊>ACM SIGIR FORUM >Stacking Bagged and Boosted Forests for E€ective Automated Classification
【24h】

Stacking Bagged and Boosted Forests for E€ective Automated Classification

机译:堆叠袋装和人工林以实现有效的自动分类

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Random Forest (RF) is one of the most successful strategies for automatedrnclassi€cation tasks. Motivated by the RF success, recentlyrnproposed RF-based classi€cation approaches leverage the central RFrnidea of aggregating a large number of low-correlated trees, whichrnare inherently parallelizable and provide exceptional generalizationrncapabilities. In this context, this work brings several new contributionsrnto this line of research. First, we propose a new RF-basedrnstrategy (BERT) that applies the boosting technique in bags of extremelyrnrandomized trees. Second, we empirically demonstraternthat this new strategy, as well as the recently proposed BROOF andrnLazyNN RF classi€ers do complement each other, motivating usrnto stack them to produce an even more e‚ective classi€er. Up tornour knowledge, this is the €rst strategy to e‚ectively combine thernthree main ensemble strategies: stacking, bagging (the cornerstonernof RFs) and boosting. Finally, we exploit the ecient and unbiasedrnstacking strategy based on out-of-bag (OOB) samples to considerablyrnspeedup the very costly training process of the stackingrnprocedure. Our experiments in several datasets covering two highdimensionalrnand noisy domains of topic and sentiment classi€cationrnprovide strong evidence in favor of the bene€ts of our RF-basedrnsolutions. We show that BERT is among the top performers in thernvast majority of analyzed cases, while retaining the unique bene€tsrnof RF classi€ers (explainability, parallelization, easiness of parameterization).rnWe also show that stacking only the recently proposedrnRF-based classi€ers and BERT using our OOB-based strategy is notrnonly signi€cantly faster than recently proposed stacking strategiesrn(up to six times) but also much more e‚ective, with gains up to 21%rnand 17% on MacroF1 and MicroF1, respectively, over the best basernmethod, and of 5% and 6% over a stacking of traditional methods,rnperforming no worse than a complete stacking of methods at arnmuch lower computational e‚ort.
机译:随机森林(RF)是自动分类任务最成功的策略之一。在RF成功的推动下,最近提出的基于RF的分类方法利用了中心RFrnidea来聚合大量低关联树,这些树固有地可并行化并提供出色的泛化能力。在这种情况下,这项工作为这一研究领域带来了一些新的贡献。首先,我们提出了一种新的基于RF的策略(BERT),该策略将增强技术应用于极端随机化的树袋中。其次,我们从经验上证明了这种新策略以及最近提出的BROOF和LazyNN RF分类器确实可以互补,从而促使我们将它们堆叠起来以产生更有效的分类器。掌握最新知识,这是有效地结合这三种主要集成策略的第一个策略:堆叠,装袋(基本RF)和提升。最后,我们利用基于袋装(OOB)样本的有效且无偏的堆叠策略,大大加快了堆叠过程非常昂贵的培训过程。我们在涵盖主题和情感分类的两个高维嘈杂领域的几个数据集中进行的实验提供了有力的证据,有利于我们基于RF的解决方案的益处。我们展示了BERT在大多数分析案例中表现最好,同时保留了独特的有益RF分类器(可解释性,并行化,参数化的简便性)。我们还展示了仅堆叠最近提出的基于rnRF的分类。 ers和BERT使用我们基于OOB的策略不仅比最近提出的堆栈策略快了多达六倍,而且效率更高,在MacroF1和MicroF1上分别提高了21%和17%,最好的基本方法,以及传统方法的5%和6%的性能,在计算成本较低的情况下,性能不比完全堆叠的方法差。

著录项

  • 来源
    《ACM SIGIR FORUM》 |2017年第cd期|105-114|共10页
  • 作者单位

    Federal University of Minas Gerais Computer Science Department Av. Antonio Carlos 6627 - ICEx Belo Horizonte, Brazil;

    Federal University of Minas Gerais Computer Science Department Av. Antonio Carlos 6627 - ICEx Belo Horizonte, Brazil;

    Federal University of Minas Gerais Computer Science Department Av. Antonio Carlos 6627 - ICEx Belo Horizonte, Brazil;

    Federal University of Minas Gerais Computer Science Department Av. Antonio Carlos 6627 - ICEx Belo Horizonte, Brazil;

    Federal University of Minas Gerais Computer Science Department Av. Antonio Carlos 6627 - ICEx Belo Horizonte, Brazil;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Classi€cation; Ensemble; Bagging; Boosting; Stacking;

    机译:分类;合奏;套袋助推;堆码;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号