首页> 外文期刊>Expert systems with applications >Handling Class Imbalance In Customer Churn Prediction
【24h】

Handling Class Imbalance In Customer Churn Prediction

机译:处理客户流失预测中的类不平衡

获取原文
获取原文并翻译 | 示例
           

摘要

Customer churn is often a rare event in service industries, but of great interest and great value. Until recently, however, class imbalance has not received much attention in the context of data mining [Weiss, G. M. (2004). Mining with rarity: A unifying framework. SIGKDD Explorations, 6 (1), 7-19]. In this study, we investigate how we can better handle class imbalance in churn prediction. Using more appropriate evaluation metrics (AUC, lift), we investigated the increase in performance of sampling (both random and advanced under-sampling) and two specific modelling techniques (gradient boosting and weighted random forests) compared to some standard modelling techniques.rnAUC and lift prove to be good evaluation metrics. AUC does not depend on a threshold, and is therefore a better overall evaluation metric compared to accuracy. Lift is very much related to accuracy, but has the advantage of being well used in marketing practice [Ling, C, & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proceedings of thefourth international conference on knowledge discovery and data mining (KDD-98). New York, NY: AAAI Press].rnResults show that under-sampling can lead to improved prediction accuracy, especially when evaluated with AUC. Unlike Ling and Li [Ling, C.,& Li, C. (1998). Data mining for direct marketing problems and solutions. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98). New York, NY: AAAI Press], we find that there is no need to under-sample so that there are as many churners in your training set as non churners. Results show no increase in predictive performance when using the advanced sampling technique CUBE in this study. This is in line with findings of Japkowicz [Jap-kowicz, N. (2000). The class imbalance problem: significance and strategies. In Proceedings of the 2000 international conference on artificial intelligence (IC-AI'2000): Special track on inductive learning, Las Vegas, Nevada], who noted that using sophisticated sampling techniques did not give any clear advantage. Weighted random forests, as a cost-sensitive learner, performs significantly better compared to random forests, and is therefore advised. It should, however always be compared to logistic regression. Boosting is a very robust classifier, but never outperforms any other technique.
机译:客户流失通常是服务行业中很少见的事件,但是却引起了极大的兴趣和巨大的价值。然而,直到最近,在数据挖掘的背景下,阶级失衡还没有引起人们的广泛关注[Weiss,G. M.(2004)。稀有采矿:一个统一的框架。 SIGKDD Explorations,6(1),7-19]。在这项研究中,我们研究了如何在流失预测中更好地处理班级失衡问题。与某些标准建模技术相比,我们使用更合适的评估指标(AUC,升幅)调查了采样性能(随机和高级欠采样)和两种特定建模技术(梯度增强和加权随机森林)的性能提升。提升被证明是很好的评估指标。 AUC不依赖于阈值,因此与准确性相比,它是更好的总体评估指标。提升与准确性非常相关,但具有在营销实践中得到很好使用的优势[Ling,C,&Li,C.(1998)。进行直接营销问题和解决方案的数据挖掘。在第四届知识发现和数据挖掘国际会议论文集(KDD-98)中。结果表明,欠采样可以提高预测精度,尤其是在使用AUC进行评估时。与Ling和Li [Ling,C.,&Li,C.(1998)。进行直接营销问题和解决方案的数据挖掘。在第四届知识发现和数据挖掘国际会议论文集(KDD-98)中。纽约,纽约:AAAI出版社],我们发现没有必要对样本进行欠采样,因此您的训练集中的搅局者与非搅局者一样多。结果表明,在本研究中使用先进的采样技术CUBE不会提高预测性能。这与Japkowicz的发现一致[Jap-kowicz,N.(2000)。班级失衡问题:意义和策略。在内华达州拉斯维加斯举行的2000年国际人工智能国际会议论文集(IC-AI'2000):归纳学习的特殊轨迹上,他指出,使用复杂的采样技术并没有明显的优势。加权随机森林是一个对成本敏感的学习者,与随机森林相比,其性能要好得多,因此建议使用。但是,应始终将其与逻辑回归进行比较。 Boosting是一个非常强大的分类器,但是从来没有超过任何其他技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号