首页> 外文期刊>Mathematical research letters: MRL >Comparative Analysis of Different Distributions Dataset by Using Data Mining Techniques on Credit Card Fraud Detection
【24h】

Comparative Analysis of Different Distributions Dataset by Using Data Mining Techniques on Credit Card Fraud Detection

机译:不同分布数据集通过使用数据挖掘技术对信用卡欺诈检测的比较分析

获取原文
获取原文并翻译 | 示例
           

摘要

Banks suffer multimillion-dollars losses each year for several reasons, the most important of which is due to credit card fraud. The issue is how to cope with the challenges we face with this kind of fraud. Skewed "class imbalance" is a very important challenge that faces this kind of fraud. Therefore, in this study, we explore four data mining techniques, namely naive Bayesian (NB),Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Random Forest (RF), on actual credit card transactions from European cardholders. This paper offers four major contributions. First, we used under-sampling to balance the dataset because of the high imbalance class, implying skewed distribution. Second, we applied NB, SVM, KNN, and RF to under-sampled class to classify the transactions into fraudulent and genuine followed by testing the performance measures using a confusion matrix and comparing them. Third, we adopted cross-validation (CV) with 10 folds to test the accuracy of the four models with a standard deviation followed by comparing the results for all our models. Next, we examined these models against the entire dataset (skewed) using the confusion matrix and AUC (Area Under the ROC Curve) ranking measure to conclude the final results to determine which would be the best model for us to use with a particular type of fraud. The results showing the best accuracy for the NB, SVM, KNN and RF classifiers are 97,80%; 97,46%; 98,16% and 98,23%, respectively. The comparative results have been done by using four-division datasets (75:25), (90:10), (66:34) and (80:20) displayed that the RF performs better than NB, SVM, and KNN, and the results when utilizing our proposed models on the entire dataset (skewed), achieved preferable outcomes to the under-sampled dataset.
机译:由于几个原因,银行每年遭受多米的损失,其中最重要的是由于信用卡欺诈。问题是如何应对与这种欺诈面临的挑战。倾斜的“类别不平衡”是一个非常重要的挑战,面临这种欺诈。因此,在本研究中,我们探索了四种数据挖掘技术,即天真贝叶斯(NB),支持向量机(SVM),K最近邻居(KNN)和随机林(RF),来自欧洲持卡人的实际信用卡交易。本文提供了四项主要贡献。首先,我们使用下的抽样来平衡数据集,因为高不平衡类,暗示偏斜分布。其次,我们应用NB,SVM,KNN和RF到采样的类别,将交易分类为欺诈和真品,然后使用混淆矩阵测试性能措施并进行比较。第三,我们采用交叉验证(CV)10倍以测试具有标准偏差的四种模型的准确性,然后将结果与所有模型进行比较。接下来,我们使用混淆矩阵和ROC曲线下的AUC(ROC曲线下的区域)对整个数据集(偏斜)进行检查这些模型,以确定最终结果,以确定哪个是与特定类型一起使用的最佳模型欺诈罪。结果表明Nb,SVM,KNN和RF分类器的最佳精度为97,80%; 97,46%; 98,16%和98,23%。通过使用四分部门数据集(75:25)(90:10),(66:34)和(80:20)所显示的比较结果,显示RF比NB,SVM和KNN更好,在利用我们在整个数据集(偏斜)上的提出模型时,结果将达到欠采样数据集的优选结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号