首页> 外文会议>International Scientific and Practical Conference on Digital Economy >An economic deterministic ensemble classifiers with probabilistic output using for robust quantification: study of unbalanced educational datasets
【24h】

An economic deterministic ensemble classifiers with probabilistic output using for robust quantification: study of unbalanced educational datasets

机译:具有稳健量化的具有概率输出的经济确定性集合分类:对不平衡教育数据集的研究

获取原文

摘要

The overall goal of our work is to find economic and robust supervised machine learning methods which adequate to both individual and collective Student Performance Forecast (SPF). The individual SPF are subject of well-known classification methods but collective SPF is subject of quantification learning algorithms dealing with the novel task to predict the frequency of classes in tested sample e.g. a number of students with unsatisfactory grade. The need for revise of classification methods shows review of 86 SPF in developing countries. The analysis depicts that most of SPF report the high overall accuracy of classifiers based on decision tree J48, Naive Base NB, Multilayer Perception MLP, k-Nearest Neighbor k-NN, and Support Vector Machine SVM algorithms, but did not take into account the accuracy of the forecast of a minor presented class. So, given the imbalance in the sample, "useful forecast" with the F1 metric above 50% (75%) are given only in 1/2 (1/5) of cases of forecasts. The pivotal study of the efficacy factors of binary SPF (data type, algorithm, sample balancing, number of classes etc.). Another important finding is that classifiers with the probabilistic Naive Bayesian kernel, have more stable behavior to classify different EDM datasets, overcoming MLP, J48, SVM and k-NN based classifiers which sometimes achieved good forecast but sometimes failed in prediction. After that, collected all the above experimental finds associated with relationship between algorithm and data information, we construct 3-15 member heterogeneous ensembles contained strong, moderate and weak classifiers for deterministic individual SPF by simple voting and heuristically proposed how individual probabilistic predictions could be generated and how to aggregate them for overall frequency forecasting, i.e. resolve the task of quantification. The proposed methods of ensemble forecasting and ensemble quantification can become the basis for new economic and robust solutions of various real-world problems in the field of machine learning.
机译:我们工作的总体目标是找到经济和强大的监督机器学习方法,适合个人和集体学生性能预测(SPF)。各个SPF是众所周知的分类方法的主题,但是集体SPF是处理新任务的量化学习算法的主题,以预测测试样品中的类别的频率。一些尚无令人满意的学生。对分类方法进行修改的需求显示了发展中国家86个SPF的审查。分析描述了基于决策树J48,NAIVE基础NB,多层感知MLP,K最近邻居K-NN和支持向量机SVM算法,大多数SPF报告了分类器的高总体精度,但是没有考虑到次要呈现课程预测的准确性。因此,鉴于样品中的不平衡,具有高于50%(75%)的F1度量(75%)的“有用的预测”仅在预测的1/2(1/5)。二元SPF效力因子的关键研究(数据类型,算法,样本平衡,类别等)。另一个重要的发现是具有概率朴素贝叶斯内核的分类器,具有更稳定的行为来对不同的EDM数据集进行分类,克服MLP,J48,SVM和基于K-NN的分类器,这些分类器有时会实现良好的预测,但有时在预测中失败。之后,收集与算法和数据信息之间的关系相关的所有上述实验发现,通过简单的投票和启发性地提出如何生成各个概率预测,构建了3-15个成员的异构集合。以及如何为它们聚合以进行整体频率预测,即解析量化的任务。所提出的集合预测和集合量化方法可以成为机器学习领域各种现实问题的新经济和强大解决方案的基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号