【24h】

Medicare Fraud Detection using CatBoost

机译:使用CatBoost进行Medicare欺诈检测

获取原文

摘要

In this study we investigate the performance of CatBoost in the task of identifying Medicare fraud. The Medicare claims data we use as input for CatBoost contain a number of categorical features. Some of these features, such as the procedure code and provider zip code, have thousands of possible values. One contribution we make in this study is to show how we use CatBoost to eliminate some data pre-processing steps that authors of related works take. A second contribution we make is to show improvements in CatBoost's performance in terms of Area Under the Receiver Operating Characteristic Curve (AUC), when we include another one of the categorical features (provider state) as input to CatBoost. We show that CatBoost attains better performance than XGBoost in the task of Medicare fraud detection with respect to the AUC metric. At a 99% confidence level (with p-value 0) our experiments show that XGBoost obtains a mean AUC value of 0.7615 while CatBoost obtains a mean AUC value of 0.7851, validating the significance of CatBoost's performance improvement over XGBoost. Moreover, when we include an additional categorical feature (healthcare provider state) in our data analysis, CatBoost yields a mean AUC value of 0.8902, which is also statistically signficant at a 99% confidence interval level (with p-value 0). Our empirical evidence clearly indicates CatBoost is a better alternative to XGBoost for Medicare fraud detection, especially when dealing with categorical features.
机译:在本研究中,我们调查了CatBoost在确定Medicare欺诈任务中的性能。我们用作CatBoost输入的Medicare索赔数据包含许多分类特征。其中一些功能(例如过程代码和提供者邮政编码)具有数千个可能的值。我们在这项研究中所做的贡献之一是,展示了我们如何使用CatBoost消除相关著作的作者采取的某些数据预处理步骤。我们做出的第二项贡献是,当我们将另一类分类特征(提供者状态)作为对CatBoost的输入时,在接收器工作特性曲线(AUC)下显示了CatBoost的性能方面的改进。我们显示,就AUC指标而言,在Medicare欺诈检测任务中,CatBoost比XGBoost具有更好的性能。在置信度为99%(p值为0)的情况下,我们的实验表明XGBoost获得的平均AUC值为0.7615,而CatBoost获得的平均AUC值为0.7851,这证明了CatBoost与XGBoost相比,性能提升的重要性。此外,当我们在数据分析中包括其他分类功能(医疗服务提供者状态)时,CatBoost得出的平均AUC值为0.8902,在99%的置信区间水平(p值为0)上,这在统计学上也很有意义。我们的经验证据清楚地表明,对于Medicare欺诈检测,尤其是在处理分类功能时,CatBoost是XGBoost的更好替代方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号