...
首页> 外文期刊>International journal of systems assurance engineering and management >Benchmarking framework for class imbalance problem using novel sampling approach for big data
【24h】

Benchmarking framework for class imbalance problem using novel sampling approach for big data

机译:使用新颖的大数据采样方法解决类不平衡问题的基准框架

获取原文
获取原文并翻译 | 示例
           

摘要

The traditional techniques of machine learning always need to be strengthened for dealing with cosmic nature of big data for systematic and methodical learning. The unbalanced distribution of classes in big data, popularly known as imbalanced big data chases the problem of learning to a much higher level. The conventional methods are being progressively modified to handle and curtail the problem of learning from imbalanced datasets in the context of big data at the data level and algorithmic level. In the current study, a cluster heads based data level sampling solution which inherits edge of K-Means and Fuzzy C-Means clustering approaches is applied. The proposed approach is evaluated with three different classifiers namely Support Vector Machines, Decision Tree and k-Nearest Neighbor and compared with conventional SMOTE algorithm. The experiment has shown promising results with an increment of 8.09% and 35.71% in terms of accuracy and AUC respectively, for all imbalanced data-sets. This work imparts a baseline comparison of solutions for imbalanced classification at data level in big data scenario and proposes an efficient clustering-based solution for same.
机译:始终需要加强传统的机器学习技术,以应对大数据的宇宙本质,从而进行系统和有条理的学习。大数据中类的不平衡分布(通常称为不平衡大数据)将学习问题推向更高的层次。常规方法正在逐步修改,以处理和减少在数据级别和算法级别的大数据上下文中从不平衡数据集学习的问题。在当前的研究中,应用了基于簇头的数据级别采样解决方案,该解决方案继承了K-Means的边缘和Fuzzy C-Means聚类方法。用支持向量机,决策树和k最近邻三个不同的分类器对提出的方法进行了评估,并与传统的SMOTE算法进行了比较。对于所有不平衡的数据集,该实验均显示出令人鼓舞的结果,在准确性和AUC方面分别增加了8.09%和35.71%。这项工作为大数据场景中的数据级别的不平衡分类提供了基准解决方案的比较,并提出了一种有效的基于聚类的解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号