首页> 外文会议>International conference on algorithms and architectures for parallel processing >Enhancing Model Performance for Fraud Detection by Feature Engineering and Compact Unified Expressions
【24h】

Enhancing Model Performance for Fraud Detection by Feature Engineering and Compact Unified Expressions

机译:通过特征工程和紧凑统一表达式提高欺诈检测的模型性能

获取原文

摘要

The performance of machine learning models can be improved in a variety of ways including segmentation, treating missing and outlier values, feature engineering, feature selection, multiple algorithms, algorithm tuning/ compactness and ensemble methods. Feature engineering and compactness of the model can have a significant impact on the algorithm's performance but usually requires detailed domain knowledge. Accuracy and compactness of machine learning models are equally important for optimal memory and storage needs. The research in this paper focuses on feature engineering and compactness of rulesets. Compactness of the ruleset can make the algorithm more efficient and derivation of new features makes the dataset high dimensional potentially resulting in higher accuracy. We have developed a technique to enhance model's performance with feature engineering and compact unified expressions for dataset of unknown domain using profile models approach. Classification accuracy is compared using well-known classifiers (Decision Tree, Ripple Down Rule and RandomForest). This technique is applied on fraud analysis bank dataset and multiple synthetic bank datasets. Empirical evaluation has shown that not only the ruleset size of training and prediction dataset is reduced but performance is also improved in other performance metrics including classification accuracy. In this paper, the transformed data is used for the experimental validation and development of fraud detection technique, but it can be used in other domains as well especially for scalable and distributed systems.
机译:机器学习模型的性能可以以各种方式改进,包括分割,处理缺失和异常值,特征工程,特征选择,多算法,算法调整/紧凑型和集合方法。该模型的特征工程和紧凑性可能对算法的性能产生重大影响,但通常需要详细的域知识。机器学习模型的准确性和紧凑性对于最佳内存和存储需求同样重要。本文的研究侧重于特征工程和规则集的紧凑性。规则集的紧凑性可以使算法更有效,并且新功能的推导使得数据集高维可能导致更高的准确性。我们开发了一种技术,可以使用简介模型方法对未知域的数据集进行模型的性能,并使用简单的域的数据集。使用众所周知的分类器(决策树,纹波规则和randomforest)进行比较分类准确性。该技术适用于欺诈分析银行数据集和多个合成银行数据集。实证评估表明,不仅减少了培训和预测数据集的规则集规则,但在其他性能指标中也有所改善,包括分类准确性。在本文中,转换后的数据用于欺诈检测技术的实验验证和开发,但它可以在其他域中使用,特别是对于可扩展和分布式系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号