首页> 外文会议>Future Technologies Conference >A modified principal component analysis approach to automated essay-type grading
【24h】

A modified principal component analysis approach to automated essay-type grading

机译:一种改进的主要成分分析方法来自动论文型分级

获取原文

摘要

This study investigates the relative efficacy of using n-grams extracted terms, the aggregation of such terms, and a combination of feature extraction techniques in building an automated essay-type grading (AETG) system. The paper focused on the modification of the Principal Component Analysis (PCA) by integrating n-grams terms as input into the PCA algorithm. Hardcopies of examiners' marking schemes and softcopies of students' answers for two courses, Management Information System (COM 317) and Research Methodology (COM 325), offered at the Department of Computer Science, Federal Polytechnic, Ilaro, during 2013/2014 academic session were used as case studies. The textual contents of the marking schemes were transcripted into electronic documents using same file format as the students' answers. The documents were pre-processed for stopwords removal and each keyword stemmed to address morphological variations. N-gram terms (N=2, 3) were then extracted across all students' answer scripts and marking scheme documents for each of the two courses. The documents were represented in the vector space model as a Document Term Matrix. Principal Component Analysis (PCA) algorithm was modified by integrating n-gram terms as input into existing PCA to derive Modified Principal Component Analysis (MPCA) algorithm. The MPCA was used to reduce the sparseness of the matrix. Document similarity was measured using cosine similarity measure which compared each student's answer script document vector with the marking scheme document vector. The MPCA based AETG system outperformed the PCA equivalent having a high positive correlation and lower mean absolute error when the human marker scores are compared to those of the system. We intend to explore other approaches that will able to capture non-textual contents in our future work.
机译:本研究研究了使用N-GRAM提取的术语的相对功效,这些术语的聚集,以及建立自动论文型分级(AETG)系统的特征提取技术的组合。本文通过将n克术语作为输入的输入集成为PCA算法,专注于修改主成分分析(PCA)。在2013/2014学术期间,在2013/2014学术会议期间,在计算机科学部门提供的两门课程,管理信息系统(COM 317)和研究方法(COM 325)提供的审查员标记计划和学生答案的答案。被用作案例研究。标记方案的文本内容将使用与学生的答案相同的文件格式转录为电子文档。该文件被预处理用于停止删除,并且每个关键字源以解决形态变异。然后在所有学生的答案脚本和标记两个课程中的标记方案文件中提取n-gram术语(n = 2,3)。作为文档术语矩阵,在向量空间模型中表示文档。通过将n克术语作为输入作为现有PCA的输入来改变主成分分析(PCA)算法以导出修改的主成分分析(MPCA)算法。 MPCA用于减少矩阵的稀疏性。使用余弦相似度测量来测量文档相似度,这些测量比较了每个学生的答案脚本文档向量与标记方案文档向量。基于MPCA的AETG系统优于具有高正相关的PCA等效,并且当人类标记分数与系统的那些进行人的标记分数时具有较低的平均绝对误差。我们打算探索其他能够在未来的工作中捕捉非文本内容的其他方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号