首页> 外文会议>IEEE International Conference on Tools with Artificial Intelligence >Data Sampling Approaches with Severely Imbalanced Big Data for Medicare Fraud Detection
【24h】

Data Sampling Approaches with Severely Imbalanced Big Data for Medicare Fraud Detection

机译:具有严重不平衡的Medicare欺诈检测的大数据的数据采样方法

获取原文

摘要

Class imbalance is an important problem in machine learning. With increases in available information and the growing use of Big Data sources to extract meaning from data, the challenges associated with class imbalance continue to influence research and shape business value. In this paper, we focus on using highly imbalanced Big Data from Medicare to detect provider claims fraud. We combine three Medicare parts and generate fraud labels using real-world excluded providers. The number of known fraudulent providers is very small, with 0.062% of the combined dataset being labeled as fraud, indicating severe class imbalance. To address class imbalance concerns, we provide experimental results incorporating six different data sampling methods (undersampling and oversampling) to create datasets for five class ratios (imbalanced to balanced), as well as using the full dataset (with no sampling). Three state-of-the-art machine learning models with Apache Spark are used to assess Medicare fraud detection performance across data sampling methods and class ratios. We demonstrate that data sampling, in particular random undersampling, presents good results across all learners, whereas oversampling provides no benefit versus models built using the full dataset.
机译:类别不平衡是机器学习中的一个重要问题。随着可用信息的增加和越来越多的大数据来源来提取来自数据的意义,与类别不平衡相关的挑战继续影响研究和形状业务价值。在本文中,我们专注于使用Medicare的高度不平衡的大数据来检测提供者索赔欺诈。我们将三个医疗保险零件结合起来,并使用现实世界排除的提供商生成欺诈标签。已知的欺诈提供者的数量非常小,其中0.062%的组合数据集被标记为欺诈,表明严重的类别不平衡。为了解决类别不平衡问题,我们提供了一种实验结果,其包含六种不同的数据采样方法(UNDET采样和过采样)来创建五类比率的数据集(不平衡为平衡),以及使用完整数据集(没有采样)。具有Apache Spark的三种最先进的机器学习模型用于评估数据采样方法和类比率的Medicare欺诈检测性能。我们展示了数据采样,特别是随机缺乏采样,涉及所有学习者的良好结果,而过采样提供了使用完整数据集建造的效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号