【24h】

An Apache Spark-Based Platform for Predicting the Performance of Undergraduate Students

机译:一个基于Apache Spark的平台来预测大学生的表现

获取原文

摘要

Nowadays, Education Data Mining (EDM) plays a very important role in higher education institutions. Plenty of algorithms have been employed to measure student's GPA in the next semester's courses. The results can be used to early identify dropout students or help students choose the elective courses which are appropriate for them. The most widely used methods are machine learning, however, the problem is the accuracy which can be changed from dataset to dataset. More importantly, the performance of prediction models can be affected by the characteristic of dataset associated with the applied model. In this paper, we build a distributed platform on Spark to predict missing grades of elective courses for undergraduate students. The paper compares several methods that are based on the combination of Collaborative Filtering & Matrix Factorization (namely Alternative Least Square). We evaluate the performance of these algorithms using a dataset provided by Ho Chi Minh University of Technology (HCMUT). The dataset consists of information about undergraduate students from 2006 to 2017. Depending on the characteristics of our dataset, the paper highlights that Alternative Least Square with non-negative constraint achieves the better results than others in comparison.
机译:如今,教育数据挖掘(EDM)在高等教育机构中扮演着非常重要的角色。在下学期的课程中,已经采用了大量算法来衡量学生的GPA。结果可用于及早识别辍学学生或帮助学生选择适合他们的选修课程。使用最广泛的方法是机器学习,但是问题是准确性可以在数据集之间更改。更重要的是,预测模型的性能可能会受到与所应用模型相关联的数据集特征的影响。在本文中,我们在Spark上构建了一个分布式平台,以预测本科生选修课的缺失成绩。本文比较了几种基于协同过滤和矩阵分解(即最小二乘)相结合的方法。我们使用胡志明市工业大学(HCMUT)提供的数据集评估这些算法的性能。该数据集包含有关2006年至2017年本科生的信息。根据我们的数据集的特征,本文强调指出,具有非负约束的替代最小二乘比其他方法具有更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号