Classification techniques for noisy and imbalanced data.

机译：嘈杂和不平衡数据的分类技术。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Machine learning techniques allow useful insight to be distilled from the increasingly massive repositories of data being stored. As these data mining techniques can only learn patterns actually present in the data, it is important that the desired knowledge be faithfully and discernibly contained therein. Two common data quality issues that often affect important real life classification applications are class noise and class imbalance. Class noise, where dependent attribute values are recorded erroneously, misleads a classifier and reduces predictive performance. Class imbalance occurs when one class represents only a small portion of the examples in a dataset, and, in such cases, classifiers often display poor accuracy on the minority class. The reduction in classification performance becomes even worse when the two issues occur simultaneously.;To address the magnified difficulty caused by this interaction, this dissertation performs thorough empirical investigations of several techniques for dealing with class noise and Unbalanced data. Comprehensive experiments are performed to assess the effects of the classification techniques on classifier performance, as well as how the level of class imbalance, level of class noise, and distribution of class noise among the classes affects results. An empirical analysis of classifier based noise detection efficiency appears first. Subsequently, an intelligent data sampling technique, based on noise detection, is proposed and tested. Several hybrid classifier ensemble techniques for addressing class noise and imbalance are introduced. Finally, a detailed empirical investigation of classification filtering is performed to determine best practices.

机译：机器学习技术允许从越来越大的存储数据仓库中提取有用的见解。由于这些数据挖掘技术只能学习实际存在于数据中的模式，因此，重要的是要忠实和可辨别地包含所需的知识。经常影响重要的现实生活分类应用程序的两个常见数据质量问题是类别噪声和类别不平衡。错误记录从属属性值的类噪声会误导分类器并降低预测性能。当一个类别仅代表数据集中示例的一小部分时，会发生类别不平衡，并且在这种情况下，分类器通常在少数类别上显示出较差的准确性。当两个问题同时发生时，分类性能的降低甚至变得更糟。为了解决这种相互作用所造成的放大难题，本文对几种处理类噪声和不平衡数据的技术进行了详尽的实证研究。进行了全面的实验，以评估分类技术对分类器性能的影响，以及类别不平衡的级别，类别噪声的级别以及类别之间的类别噪声的分布如何影响结果。首先出现基于分类器的噪声检测效率的实证分析。随后，提出并测试了一种基于噪声检测的智能数据采样技术。介绍了几种用于解决类噪声和不平衡的混合分类器集成技术。最后，对分类过滤进行详细的经验研究，以确定最佳实践。

著录项

作者
Napolitano, Amri.;
展开▼
作者单位

Florida Atlantic University.;

展开▼
授予单位 Florida Atlantic University.;
学科 Computer Science.
学位 Ph.D.
年度 2009
页码 218 p.
总页数 218
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An intelligent undersampling technique based upon intuitionistic fuzzy sets to alleviate class imbalance problem of classification with noisy environment [J] . Prabhjot Kaur, Anjana Gosain International journal of intelligent engineering informatics . 2018,第5期

机译：基于直觉模糊集的智能欠采样技术，缓解环境嘈杂的分类不平衡问题
2. Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data [J] . Khoshgoftaar T. M., Van Hulse J., Napolitano A. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on . 2011,第3期

机译：将增强和装袋技术与嘈杂和不平衡数据进行比较
3. SMOTE-NaN-DE: Addressing the noisy and borderline examples problem in imbalanced classification by natural neighbors and differential evolution [J] . Li Junnan, Zhu Qingsheng, Wu Quanwang, Knowledge-Based Systems . 2021,第Jula8期

机译：SMOTE-NAN-DE：通过自然邻居和差异演化来解决不平衡分类中的嘈杂和边界示例问题
4. Effectiveness of Basic and Advanced Sampling Strategies on the Classification of Imbalanced Data. A Comparative Study Using Classical and Novel Metrics [C] . Mohamed S. Kraiem, Maria N. Moreno International conference on hybrid artificial intelligent systems . 2017

机译：基本和高级抽样策略对不平衡数据分类的有效性。使用古典和小说度量标准的比较研究
5. Learning in extreme conditions: Online and active learning with massive, imbalanced and noisy data. [D] . Ertekin, Seyda. 2009

机译：极端条件下的学习：具有大量，不平衡且嘈杂的数据的在线和主动学习。
6. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification [O] . Jinyan Li, Simon Fong, Yunsick Sung, 2016

机译：生物医学数据分类中基于二元不平衡数据集的自适应群聚动态多目标综合少数抽样技术算法
7. Variance Ranking Attributes Selection Techniques for Binary Classification Problem in Imbalance Data [O] . Solomon H. Ebenuwa, Mhd Saeed Sharif, Mamoun Alazab, 2019

机译：不平衡数据中二进制分类问题的差异排名属性选择技术

Classification techniques for noisy and imbalanced data.

摘要

著录项

相似文献

相关主题

期刊订阅