首页> 外文学位 >Classification techniques for noisy and imbalanced data.
【24h】

Classification techniques for noisy and imbalanced data.

机译:嘈杂和不平衡数据的分类技术。

获取原文
获取原文并翻译 | 示例

摘要

Machine learning techniques allow useful insight to be distilled from the increasingly massive repositories of data being stored. As these data mining techniques can only learn patterns actually present in the data, it is important that the desired knowledge be faithfully and discernibly contained therein. Two common data quality issues that often affect important real life classification applications are class noise and class imbalance. Class noise, where dependent attribute values are recorded erroneously, misleads a classifier and reduces predictive performance. Class imbalance occurs when one class represents only a small portion of the examples in a dataset, and, in such cases, classifiers often display poor accuracy on the minority class. The reduction in classification performance becomes even worse when the two issues occur simultaneously.;To address the magnified difficulty caused by this interaction, this dissertation performs thorough empirical investigations of several techniques for dealing with class noise and Unbalanced data. Comprehensive experiments are performed to assess the effects of the classification techniques on classifier performance, as well as how the level of class imbalance, level of class noise, and distribution of class noise among the classes affects results. An empirical analysis of classifier based noise detection efficiency appears first. Subsequently, an intelligent data sampling technique, based on noise detection, is proposed and tested. Several hybrid classifier ensemble techniques for addressing class noise and imbalance are introduced. Finally, a detailed empirical investigation of classification filtering is performed to determine best practices.
机译:机器学习技术允许从越来越大的存储数据仓库中提取有用的见解。由于这些数据挖掘技术只能学习实际存在于数据中的模式,因此,重要的是要忠实和可辨别地包含所需的知识。经常影响重要的现实生活分类应用程序的两个常见数据质量问题是类别噪声和类别不平衡。错误记录从属属性值的类噪声会误导分类器并降低预测性能。当一个类别仅代表数据集中示例的一小部分时,会发生类别不平衡,并且在这种情况下,分类器通常在少数类别上显示出较差的准确性。当两个问题同时发生时,分类性能的降低甚至变得更糟。为了解决这种相互作用所造成的放大难题,本文对几种处理类噪声和不平衡数据的技术进行了详尽的实证研究。进行了全面的实验,以评估分类技术对分类器性能的影响,以及类别不平衡的级别,类别噪声的级别以及类别之间的类别噪声的分布如何影响结果。首先出现基于分类器的噪声检测效率的实证分析。随后,提出并测试了一种基于噪声检测的智能数据采样技术。介绍了几种用于解决类噪声和不平衡的混合分类器集成技术。最后,对分类过滤进行详细的经验研究,以确定最佳实践。

著录项

  • 作者

    Napolitano, Amri.;

  • 作者单位

    Florida Atlantic University.;

  • 授予单位 Florida Atlantic University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 218 p.
  • 总页数 218
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号