首页> 外文学位 >A scalable hybrid model for health care insurance fraud detection using association rules and random forest.
【24h】

A scalable hybrid model for health care insurance fraud detection using association rules and random forest.

机译:使用关联规则和随机森林进行医疗保险欺诈检测的可扩展混合模型。

获取原文
获取原文并翻译 | 示例

摘要

Fraud detection is becoming an increasing area of focus in the health care industry due to its major effects on health care expenses and quality of service. Therefore, this research proposes a novel approach, the Hybrid Association Rules and Random Forest (HARRF), to detect fraud in health care insurance claims. With HARRF, frequent patterns extracted through Frequent Pattern Growth (FP-growth) are used to construct the Association Rules. Then HARRF utilizes the extracted Association Rules as a new feature space for the data. The extracted Association Rules are filtered and used to transform the training data to the new feature space, which results in the Transformed Feature Matrix (TFM). The TFM process unifies the feature space for the claims as well as condensing the information and reducing the dataset size. Next, the TFM is utilized as the input to train the Random Forest (RF) classifier. Similarly, the testing data is transformed to a separate TFM using the same feature space. In this research, a public insurance claim dataset for Medicare (DE-SynPUF) is used to train and validate the proposed methodology. This dataset has 160 million claims for 2.4 million beneficiaries. HARRF is validated through several experiments and a 5-fold cross-validation. In addition, design of experiments is used to identify parameters critical to the prediction accuracy. As a result, parameter tuning strategies are identified. After training the model, the average model accuracy achieved through cross-validation is 90%. Because of the size of the data used, distributed computing (Hadoop) is utilized to train and test the proposed methodology. Finally, this research studied the effects of the number of Hadoop nodes on RF performance time.
机译:由于欺诈检测对医疗保健费用和服务质量的重大影响,欺诈检测已成为医疗保健行业越来越关注的领域。因此,这项研究提出了一种新颖的方法,即混合关联规则和随机森林(HARRF),以检测医疗保险索赔中的欺诈行为。对于HARRF,通过频繁模式增长(FP-growth)提取的频繁模式用于构建关联规则。然后,HARF将提取的关联规则用作数据的新特征空间。过滤提取的关联规则,并将其用于将训练数据转换为新的特征空间,从而生成转换特征矩阵(TFM)。 TFM流程统一了声明的特征空间,并压缩了信息并减小了数据集的大小。接下来,将TFM用作输入来训练随机森林(RF)分类器。同样,使用相同的特征空间将测试数据转换为单独的TFM。在这项研究中,用于Medicare的公共保险索赔数据集(DE-SynPUF)用于训练和验证所提出的方法。该数据集有针对160万受益人的1.6亿索赔。 HARRF通过几个实验和5倍交叉验证进行了验证。另外,实验设计用于识别对预测精度至关重要的参数。结果,确定了参数调整策略。训练模型后,通过交叉验证获得的平均模型准确性为90%。由于使用的数据量大,因此利用分布式计算(Hadoop)来训练和测试所提出的方法。最后,本研究研究了Hadoop节点数量对RF性能时间的影响。

著录项

  • 作者

    Alqudah, Mohammad Khaled.;

  • 作者单位

    State University of New York at Binghamton.;

  • 授予单位 State University of New York at Binghamton.;
  • 学科 Industrial engineering.
  • 学位 M.Eng.
  • 年度 2015
  • 页码 115 p.
  • 总页数 115
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 水产、渔业;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号