首页> 外文学位 >A selective sampling method for imbalanced data learning on support vector machines.
【24h】

A selective sampling method for imbalanced data learning on support vector machines.

机译:一种在支持向量机上进行不平衡数据学习的选择性采样方法。

获取原文
获取原文并翻译 | 示例

摘要

The class imbalance problem in classification has been recognized as a significant research problem in recent years and a number of methods have been introduced to improve classification results. Rebalancing class distributions (such as over-sampling or under-sampling of learning datasets) has been popular due to its ease of implementation and relatively good performance. For the Support Vector Machine (SVM) classification algorithm, research efforts have focused on reducing the size of learning sets because of the algorithm's sensitivity to the size of the dataset. In this dissertation, we propose a metaheuristic approach (Genetic Algorithm) for under-sampling of an imbalanced dataset in the context of a SVM classifier. The goal of this approach is to find an optimal learning set from imbalanced datasets without empirical studies that are normally required to find an optimal class distribution. Experimental results using real datasets indicate that this metaheuristic under-sampling performed well in rebalancing class distributions. Furthermore, an iterative sampling methodology was used to produce smaller learning sets by removing redundant instances. It incorporates informative and the representative under-sampling mechanisms to speed up the learning procedure for imbalanced data learning with a SVM. When compared with existing rebalancing methods and the metaheuristic approach to under-sampling, this iterative methodology not only provides good performance but also enables a SVM classifier to learn using very small learning sets for imbalanced data learning. For large-scale imbalanced datasets, this methodology provides an efficient and effective solution for imbalanced data learning with an SVM.
机译:近年来,分类中的类不平衡问题已被认为是一个重要的研究问题,并且已经引入了许多方法来改善分类结果。重新平衡类分布(例如学习数据集的过采样或欠采样)由于其易于实施和相对良好的性能而受到欢迎。对于支持向量机(SVM)分类算法,由于算法对数据集大小的敏感性,研究工作集中在减小学习集的大小上。在本文中,我们提出了一种在支持向量机分类器的环境下对不平衡数据集进行欠采样的元启发式方法(遗传算法)。这种方法的目标是从不平衡的数据集中找到最佳的学习集,而无需进行通常需要找到最佳类别分布的经验研究。使用实际数据集的实验结果表明,这种元启发式欠采样在重新平衡类分布中表现良好。此外,通过删除冗余实例,使用迭代采样方法来生成较小的学习集。它结合了信息性和代表性的欠采样机制,可加快使用SVM进行不平衡数据学习的学习过程。与现有的重新平衡方法和元启发式方法进行欠采样的方法相比,这种迭代方法不仅可以提供良好的性能,而且还可以使SVM分类器使用非常小的学习集进行不平衡数据学习。对于大规模不平衡数据集,此方法为使用SVM进行不平衡数据学习提供了一种有效的解决方案。

著录项

  • 作者

    Choi, Jong Myong.;

  • 作者单位

    Iowa State University.;

  • 授予单位 Iowa State University.;
  • 学科 Statistics.;Computer Science.;Engineering Industrial.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 106 p.
  • 总页数 106
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号