首页> 外文期刊>Information Systems >Corrective classification: Learning from data imperfections with aggressive and diverse classifier ensembling
【24h】

Corrective classification: Learning from data imperfections with aggressive and diverse classifier ensembling

机译:纠正性分类:通过积极,多样的分类器整合,从数据缺陷中学习

获取原文
获取原文并翻译 | 示例
       

摘要

Learning from imperfect (noisy) information sources is a challenging and reality issue for many data mining applications. Common practices include data quality enhancement by applying data preprocessing techniques or employing robust learning algorithms to avoid developing overly complicated structures that overfit the noise. The essential goal is to reduce noise impact and eventually enhance the learners built from noise-corrupted data. In this paper, we propose a novel corrective classification (C2) design, which incorporates data cleansing, error correction, Bootstrap sampling and classifier ensembling for effective learning from noisy data sources. C2 differs from existing classifier ensembling or robust learning algorithms in two aspects. On one hand, a set of diverse base learners of C2 constituting the ensemble are constructed via a Bootstrap sampling process; on the other hand, C2 further improves each base learner by unifying error detection, correction and data cleansing to reduce noise impact. Being corrective, the classifier ensemble is built from data preprocessed/corrected by the data cleansing and correcting modules. Experimental comparisons demonstrate that C2 is not only more accurate than the learner built from original noisy sources, but also more reliable than Bagging [4] or aggressive classifier ensemble (ACE) [56], which are two degenerated components/variants of C2. The comparisons also indicate that C2 is more stable than Boosting and DECORATE, which are two state-of-the-art ensembling methods. For real-world imperfect information sources (i.e. noisy training and/or test data), C2 is able to deliver more accurate and reliable prediction models than its other peers can offer.
机译:对于许多数据挖掘应用程序而言,从不完善的(嘈杂的)信息源中学习是一个具有挑战性的现实问题。常见的做法包括通过应用数据预处理技术或采用可靠的学习算法来提高数据质量,以避免开发过于复杂的结构以适应噪声。基本目标是减少噪声影响并最终增强从受噪声破坏的数据构建的学习者。在本文中,我们提出了一种新颖的纠正性分类(C2)设计,该设计结合了数据清理,纠错,Bootstrap采样和分类器组合,可从嘈杂的数据源中有效学习。 C2在两个方面不同于现有的分类器集合或鲁棒的学习算法。一方面,通过Bootstrap采样过程构造了组成C2的一组不同的C2基础学习者;另一方面,C2通过统一错误检测,纠正和数据清理以减少噪声影响,进一步改善了每个基础学习者。作为校正,分类器集合是根据由数据清洗和校正模块预处理/校正的数据构建的。实验比较表明,C2不仅比从原始噪声源构建的学习器更准确,而且比Bagging [4]或积极分类器集成(ACE)[56]更可靠,后者是C2的两个退化成分/变体。比较还表明,C2比Boosting和DECORATE这两种最先进的组装方法更稳定。对于现实世界中不完美的信息源(即嘈杂的培训和/或测试数据),C2能够提供比其他同行更准确,更可靠的预测模型。

著录项

  • 来源
    《Information Systems》 |2011年第8期|p.1135-1157|共23页
  • 作者单位

    Vermont Information Processing, 402 Watertower Circle, Colchester, Vermont 05446, United States;

    Centre for Quantum Computation & Intelligent Systems, University of Technology, Sydney, NSW 2007, Australia;

    School of Computer Science ε Information Engineering, Hefei University of Technology, Hefei 230009, China,Department of Computer Science, University of Vermont, Burlington VT 05405, United States;

    Department of Microbiology & Molecular Genetics, University of Vermont, Burlington VT 05405, United States;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    noisy data; error correction; bagging; bootstrap sampling; classifier ensemble;

    机译:嘈杂的数据;错误修正;套袋自举采样;分类器集合;
  • 入库时间 2022-08-18 02:48:00

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号