首页> 外文会议>European Conference on Artificial Intelligence;Conference on Prestigious Applications of Intelligent Systems >Sequencing, Combining and Sampling Classifiers to Help Find Needles in Haystacks
【24h】

Sequencing, Combining and Sampling Classifiers to Help Find Needles in Haystacks

机译:测序,组合和采样分类器,帮助在干草堆中找到针头

获取原文

摘要

Many binary prediction situations involve imbalanced datasets where the ratio of the minority class over the majority class is very low. This is especially true when dealing with problems looking to use machine learning to better detect fraud, errors or exceptions. In this paper, we address the problem of extreme imbalance, i.e. where the imbalance ratio of majority over minority instances exceeds 500. Given the scarcity of minority examples, oversampling is not sensible due to expensive computational cost. Hence, we explore and expand undersampling approaches. Specifically, we propose a modeling framework (i.e., sequence of modeling steps) that seeks to leverage as much training data as possible. Our results indicate the better trade-off between the false positives and false negatives, which makes it more suitable for real-life application.
机译:许多二进制预测情况涉及不平衡的数据集,其中少数类别对多数类的比率非常低。 在处理寻求使用机器学习的问题时尤其如此,以更好地检测欺诈,错误或例外。 在本文中,我们解决了极端不平衡的问题,即大多数少数群体情况超过500的问题。鉴于少数群体实例的稀缺,由于昂贵的计算成本,过采样是不明智的。 因此,我们探索并扩大欠采样方法。 具体地,我们提出了一种建模框架(即,建模步骤的序列),其寻求尽可能多地利用培训数据。 我们的结果表明了假阳性和假阴性之间的更好的权衡,这使得它更适合现实生活。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号