首页> 外文期刊>Journal of Big Data >Big Data fraud detection using multiple medicare data sources
【24h】

Big Data fraud detection using multiple medicare data sources

机译:使用多个医疗保险数据源进行大数据欺诈检测

获取原文
           

摘要

Abstract In the United States, advances in technology and medical sciences continue to improve the general well-being of the population. With this continued progress, programs such as Medicare are needed to help manage the high costs associated with quality healthcare. Unfortunately, there are individuals who commit fraud for nefarious reasons and personal gain, limiting Medicare’s ability to effectively provide for the healthcare needs of the elderly and other qualifying people. To minimize fraudulent activities, the Centers for Medicare and Medicaid Services (CMS) released a number of “Big Data” datasets for different parts of the Medicare program. In this paper, we focus on the detection of Medicare fraud using the following CMS datasets: (1) Medicare Provider Utilization and Payment Data: Physician and Other Supplier (Part B), (2) Medicare Provider Utilization and Payment Data: Part D Prescriber (Part D), and (3) Medicare Provider Utilization and Payment Data: Referring Durable Medical Equipment, Prosthetics, Orthotics and Supplies (DMEPOS). Additionally, we create a fourth dataset which is a combination of the three primary datasets. We discuss data processing for all four datasets and the mapping of real-world provider fraud labels using the List of Excluded Individuals and Entities (LEIE) from the Office of the Inspector General. Our exploratory analysis on Medicare fraud detection involves building and assessing three learners on each dataset. Based on the Area under the Receiver Operating Characteristic (ROC) Curve performance metric, our results show that the Combined dataset with the Logistic Regression (LR) learner yielded the best overall score at 0.816, closely followed by the Part B dataset with LR at 0.805. Overall, the Combined and Part B datasets produced the best fraud detection performance with no statistical difference between these datasets, over all the learners. Therefore, based on our results and the assumption that there is no way to know within which part of Medicare a physician will commit fraud, we suggest using the Combined dataset for detecting fraudulent behavior when a physician has submitted payments through any or all Medicare parts evaluated in our study.
机译:摘要在美国,技术和医学科学的进步继续改善人口的总体福祉。随着这种持续发展,需要诸如Medicare之类的计划来帮助管理与高质量医疗保健相关的高成本。不幸的是,有些人出于邪恶的原因和个人利益进行欺诈,从而限制了Medicare有效满足老年人和其他合格人员的医疗保健需求的能力。为了最大程度地减少欺诈活动,医疗保险和医疗补助服务中心(CMS)发布了针对医疗保险计划不同部分的许多“大数据”数据集。在本文中,我们着重于使用以下CMS数据集检测Medicare欺诈:(1)Medicare提供商使用和付款数据:医师和其他供应商(B部分),(2)Medicare提供商使用和付款数据:D部分开处方者(D部分),以及(3)医疗保险提供者的使用和付款数据:参考耐用的医疗设备,假肢,矫形器和补给品(DMEPOS)。此外,我们创建了第四个数据集,它是三个主要数据集的组合。我们使用监察长办公室的排除个人和实体列表(LEIE),讨论了所有四个数据集的数据处理以及现实世界中提供者欺诈标签的映射。我们对医疗保险欺诈检测的探索性分析涉及在每个数据集上建立和评估三个学习者。根据接收器工作特征(ROC)曲线性能指标下的面积,我们的结果表明,结合Logistic回归(LR)学习者的组合数据集产生的最佳总体得分为0.816,紧随其后的是LR为0.805的B部分数据集。总体而言,在所有学习者中,合并和B部分数据集产生了最佳的欺诈检测性能,而这些数据集之间没有统计差异。因此,根据我们的结果和无法得知医生将在医疗保险的哪个部分进行欺诈的假设,我们建议当医生通过评估的任何或所有医疗保险部分提交付款时,使用组合数据集来检测欺诈行为在我们的研究中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号