Cost-sensitive boosting for classification of imbalanced data.

机译：成本敏感型提升对不平衡数据的分类。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The classification of data with imbalanced class distributions has posed a significant drawback in the performance attainable by most well-developed classification systems, which assume relatively balanced class distributions. This problem is especially crucial in many application domains, such as medical diagnosis, fraud detection, network intrusion, etc., which are of great importance in machine learning and data mining.;This thesis explores meta-techniques which are applicable to most classifier learning algorithms, with the aim to advance the classification of imbalanced data. Boosting is a powerful meta-technique to learn an ensemble of weak models with a promise of improving the classification accuracy. AdaBoost has been taken as the most successful boosting algorithm. This thesis starts with applying AdaBoost to an associative classifier for both learning time reduction and accuracy improvement. However, the promise of accuracy improvement is trivial in the context of the class imbalance problem, where accuracy is less meaningful. The insight gained from a comprehensive analysis on the boosting strategy of AdaBoost leads to the investigation of cost-sensitive boosting algorithms, which are developed by introducing cost items into the learning framework of AdaBoost. The cost items are used to denote the uneven identification importance among classes, such that the boosting strategies can intentionally bias the learning towards classes associated with higher identification importance and eventually improve the identification performance on them. Given an application domain, cost values with respect to different types of samples are usually unavailable for applying the proposed cost-sensitive boosting algorithms. To set up the effective cost values, empirical methods are used for bi-class applications and heuristic searching of the Genetic Algorithm is employed for multi-class applications.;This thesis also covers the implementation of the proposed cost-sensitive boosting algorithms. It ends with a discussion on the experimental results of classification of real-world imbalanced data. Compared with existing algorithms, the new algorithms this thesis presents are superior in achieving better measurements regarding the learning objectives.

机译：具有不平衡类别分布的数据分类在大多数发达的分类系统（假定相对平衡的类别分布）可实现的性能方面造成了重大缺陷。该问题在医学诊断，欺诈检测，网络入侵等许多应用领域中尤为关键，这些领域在机器学习和数据挖掘中至关重要。;本文探索了适用于大多数分类器学习的元技术。算法，以促进不平衡数据的分类。 Boosting是一种强大的元技术，可以学习一组弱模型并有望提高分类精度。 AdaBoost已被视为最成功的增强算法。本文从将AdaBoost应用于关联分类器开始，以减少学习时间并提高准确性。但是，在类的不平衡问题中，准确性提高的意义微不足道，因此提高准确性的希望微不足道。通过对AdaBoost的提升策略进行全面分析获得的见解导致对成本敏感的提升算法的研究，该算法是通过将成本项引入AdaBoost的学习框架中而开发的。成本项用于表示类别之间不均匀的识别重要性，因此，增强策略可以有意使学习偏向与较高识别重要性相关的类别，并最终提高对它们的识别性能。在给定应用领域的情况下，通常无法获得有关不同类型样本的成本值以应用建议的成本敏感的提升算法。为了建立有效的成本值，将经验方法用于双类别应用程序，并将启发式搜索应用于遗传算法用于多类别应用程序；；本文还涵盖了所提出的成本敏感提升算法的实现。最后讨论了对现实世界中不平衡数据进行分类的实验结果。与现有算法相比，本文提出的新算法在实现关于学习目标的更好度量方面具有优势。

著录项

作者
Sun, Yanmin.;
展开▼
作者单位

University of Waterloo (Canada).;

展开▼
授予单位 University of Waterloo (Canada).;
学科 Computer Science.
学位 Ph.D.
年度 2007
页码 181 p.
总页数 181
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Cost-sensitive boosting for classification of imbalanced data [J] . Sun YM, Kamel MS, Wong AKC, Pattern Recognition: The Journal of the Pattern Recognition Society . 2007,第12期

机译：成本敏感型提升对不平衡数据的分类
2. Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data [J] . Ali Safdar, Majid Abdul, Javed Syed Gibran, Computers in Biology and Medicine . 2016,第Null期

机译：Can-CSC-GBE：使用Gentleboost集成开发成本敏感型分类器，用于使用蛋白质氨基酸和不平衡数据进行乳腺癌分类
3. Cost-Sensitive Weighting and Imbalance-Reversed Bagging for Streaming Imbalanced and Concept Drifting in Electricity Pricing Classification [J] . Ng Wing W. Y., Zhang Jianjun, Lai Chun Sing, IEEE transactions on industrial informatics . 2019,第3期

机译：电力定价分类中流不平衡和概念漂移的成本敏感加权和不平衡反向装袋
4. Cost-Sensitive Extreme Gradient Boosting for Imbalanced Classification of Breast Cancer Diagnosis [C] . Manop Phankokkruad IEEE International Conference on Control System, Computing and Engineering . 2020

机译：成本敏感的极端梯度促进乳腺癌诊断的不平衡分类
5. Cost-sensitive stochastic gradient boosting within a quantitative regression framework [D] . Kriegler, Brian 2007

机译：在定量回归框架内提高成本敏感型随机梯度
6. Cost-Sensitive Classification for Evolving Data Streams with Concept Drift and Class Imbalance [O] . Yange Sun, Meng Li, Lei Li, 2021

机译：具有与概念漂移和类不平衡的演化数据流的成本敏感分类
7. Cost-Sensitive Boosting for Classification of Imbalanced Data [O] . Sun, Yanmin 2007

机译：成本敏感的不平衡数据分类提升

Cost-sensitive boosting for classification of imbalanced data.

摘要

著录项

相似文献

相关主题

期刊订阅