首页> 外文会议>Iberoamerican Congress on Pattern Recognition >Managing Imbalanced Data Sets in Multi-label Problems: A Case Study with the SMOTE Algorithm
【24h】

Managing Imbalanced Data Sets in Multi-label Problems: A Case Study with the SMOTE Algorithm

机译:管理多标签问题中的不平衡数据集:用粉碎算法进行案例研究

获取原文

摘要

Multi-label learning has been becoming an increasingly active area into the machine learning community since a wide variety of real world problems are naturally multi-labeled. However, it is not uncommon to find disparities among the number of samples of each class, which constitutes an additional challenge for the learning algorithm. SMOTE is an oversampling technique that has been successfully applied for balancing single-labeled data sets, but has not been used in multi-label frameworks so far. In this work, several strategies are proposed and compared in order to generate synthetic samples for balancing data sets in the training of multi-label algorithms. Results show that a correct selection of seed samples for oversampling improves the classification performance of multi-label algorithms. The uniform generation oversampling, provides an efficient methodology for a wide scope of real world problems.
机译:自从各种各样的现实世界问题自然是多标记的,多标签学习已成为机器学习界的越来越活跃的地区。然而,在每个类的样本数量之间发现差异并不罕见,这构成了学习算法的额外挑战。 Smote是一种过采样技术,已成功应用于平衡单标签数据集,但到目前为止还没有用于多标签框架。在这项工作中,提出了几种策略,并比较了为在多标签算法训练中产生用于平衡数据集的合成样本。结果表明,用于过采样的正确选择采样改善了多标签算法的分类性能。统一一代过采样,为广泛的现实世界问题提供了有效的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号