【24h】

An Improved XGBoost Model Based on Spark for Credit Card Fraud Prediction

机译:基于火花的信用卡欺诈预测的改进XGBoost模型

获取原文

摘要

Credit card fraud causes huge economic losses for many financial institutions. Given the imbalance of dataset and the huge amount of data in the field of credit card fraud, an improved XGBoost model based on Spark is proposed. In this project, the Smote algorithm was used to to balance the training set. And the XGBoost classifier based on Spark was used as the fraud detection mechanism. Finally, the test sets were classified in parallel. In the model comparison experiment, the model proposed in this project is compared with logistic regression model, decision tree model, random forest model, and original XGBoost model. The experimental results show that in the three metrics of Recall, Fl-Score, and AUC, the model proposed in this project is the best, which is 9.1%, 1.4%, and 1.2% ahead of the model ranked second respectively. In the speedup experiment, the speedup on the dataset of 70,000, 140,000, and 280,000 samples are 2.06, 3.28, and 3.75 respectively. The experimental results of these two parts show that the proposed model can accurately and efficiently predict credit card fraud and has a good practical effect.
机译:信用卡欺诈使许多金融机构导致巨大的经济损失。鉴于数据集的不平衡和信用卡欺诈领域中的大量数据,提出了一种基于火花的改进的XGBoost模型。在该项目中,粉碎算法用于平衡训练集。基于火花的XGBoost分类器用作欺诈检测机制。最后,测试集并行分类。在模型比较实验中,将该项目中提出的模型与Logistic回归模型,决策树模型,随机林模型和原始XGBoost模型进行比较。实验结果表明,在召回,飞机和AUC的三个指标中,该项目提出的模型是最佳的,分别在模型中排名为9.1%,1.4%和1.2%。在加速实验中,数据集的加速分别为70,000,140,​​000和280,000个样本分别为2.06,3.28和3.75。这两部分的实验结果表明,该建议的模型可以准确,有效地预测信用卡欺诈并具有良好的实际效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号