首页> 美国卫生研究院文献>Elsevier Sponsored Documents >A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data
【2h】

A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data

机译:一个基于阈值移动的简单插件装袋集合用于对二进制和多类不平衡数据进行分类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Class imbalance presents a major hurdle in the application of classification methods. A commonly taken approach is to learn ensembles of classifiers using rebalanced data. Examples include bootstrap averaging (bagging) combined with either undersampling or oversampling of the minority class examples. However, rebalancing methods entail asymmetric changes to the examples of different classes, which in turn can introduce their own biases. Furthermore, these methods often require specifying the performance measure of interest a priori, i.e., before learning. An alternative is to employ the threshold moving technique, which applies a threshold to the continuous output of a model, offering the possibility to adapt to a performance measure a posteriori, i.e., a plug-in method. Surprisingly, little attention has been paid to this combination of a bagging ensemble and threshold-moving. In this paper, we study this combination and demonstrate its competitiveness. Contrary to the other resampling methods, we preserve the natural class distribution of the data resulting in well-calibrated posterior probabilities. Additionally, we extend the proposed method to handle multiclass data. We validated our method on binary and multiclass benchmark data sets by using both, decision trees and neural networks as base classifiers. We perform analyses that provide insights into the proposed method.
机译:类别不平衡是应用分类方法的主要障碍。常用的方法是使用重新平衡的数据来学习分类器的集合。示例包括引导程序平均(装袋)与少数类示例的欠采样或过采样组合。但是,重新平衡方法需要对不同类别的示例进行非对称更改,从而可能会引入其自身的偏差。此外,这些方法通常需要先验地,即在学习之前,指定感兴趣的性能度量。一种替代方法是采用阈值移动技术,该技术将阈值应用于模型的连续输出,从而提供了适应后验性能度量(即,插入方法)的可能性。出乎意料的是,对套袋装和门槛移动的这种结合几乎没有引起注意。在本文中,我们研究了这种组合并展示了其竞争力。与其他重采样方法相反,我们保留数据的自然类分布,从而产生经过良好校准的后验概率。此外,我们将提出的方法扩展为处理多类数据。通过使用决策树和神经网络作为基础分类器,我们在二进制和多类基准数据集上验证了我们的方法。我们进行分析以提供对所提出方法的见识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号