首页> 外文会议>IEEE Asia-Pacific Conference on Computer Science and Data Engineering >Random Forest Classifier for Detecting Credit Card Fraud based on Performance Metrics
【24h】

Random Forest Classifier for Detecting Credit Card Fraud based on Performance Metrics

机译:基于绩效指标检测信用卡欺诈的随机林分类器

获取原文

摘要

There are many classification algorithms available, however, one classifier that can be used for a problem domain with paramount accuracy is hard to find. Classification algorithm is a technique used to map data into known classes or outputs. A problem area that has seen a lot of application of classification algorithm is the Credit Card Fraud. Credit card fraud is not a new area that needs exploration but still there is scope to narrow down the best classification algorithm to rely upon to detect frauds in real time. In this paper, the focus is on investigating and determining which classification algorithm is the best one for detecting Credit Card Fraud through benchmark datasets. It has been found that Random Forest has the best accuracy when compared to other classifiers. The study would assist researchers in choosing the best classification scheme with the guideline provided for any credit card fraud dataset. The two datasets used in this research are imbalanced datasets, therefore, for better comparison of the algorithms, a balanced set is also used. The balancing of dataset is done through Synthetic Minority Oversampling Technique (SMOT). The comparison of results is done on 6 algorithms, namely, Random Forest, Logistic Regression, Neural Networks, Support Vector Machines (SVMs), Naive Bayes and K-Nearest Neighbor (KNN). The results are compared through two software; Weka and Python. The outcome of the experiment show that the methodology is indeed of great assistance in any practical applications.
机译:然而,有许多分类算法可用,一个分类器可以用于问题域,难以查找。分类算法是用于将数据映射到已知类别或输出的技术。已经看到大量应用分类算法的问题区域是信用卡欺诈。信用卡欺诈不是一个需要探索的新区域,但仍然存在范围来缩小最佳分类算法,以实时检测欺诈。在本文中,重点是在调查和确定哪个分类算法是通过基准数据集来检测信用卡欺诈的最佳分类算法。有人发现,与其他分类器相比,随机森林具有最佳准确性。该研究将协助研究人员选择最佳分类方案,其中包含任何信用卡欺诈数据集的指南。本研究中使用的两个数据集是不平衡数据集,因此,为了更好地比较算法,还使用平衡集。数据集的平衡是通过合成少数群体过采样技术(SMOT)完成的。结果的比较是在6种算法,即随机森林,逻辑回归,神经网络,支持向量机(SVM),天真贝叶斯和K最近邻(KNN)的算法中进行的。结果通过两种软件进行比较; Weka和Python。实验结果表明,该方法在任何实际应用中都有很大的帮助。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号