...
首页> 外文期刊>Journal of Educational Data Mining >Exploring Induced Pedagogical Strategies Through a Markov Decision Process Framework: Lessons Learned
【24h】

Exploring Induced Pedagogical Strategies Through a Markov Decision Process Framework: Lessons Learned

机译:通过马尔可夫决策过程框架探索归纳的教学策略:经验教训

获取原文
           

摘要

An important goal in the design and development of Intelligent Tutoring Systems (ITSs) is to have a system that adaptively reacts to students’ behavior in the short term and effectively improves their learning performance in the long term. Inducing effective pedagogical strategies that accomplish this goal is an essential challenge. To address this challenge, we explore three aspects of a Markov Decision Process (MDP) framework through four experiments. The three aspects are: 1) reward function, detecting the impact of immediate and delayed reward on effectiveness of the policies; 2) state representation, exploring ECR-based, correlation-based, and ensemble feature selection approaches for representing the MDP state space; and 3) policy execution, investigating the effectiveness of stochastic and deterministic policy executions on learning. The most important result of this work is that there exists an aptitude-treatment interaction (ATI) effect in our experiments: the policies have significantly different impacts on the particular types of students as opposed to the entire population. We refer the students who are sensitive to the policies as the Responsive group. All our following results are based on the Responsive group. First, we find that an immediate reward can facilitate a more effective induced policy than a delayed reward. Second, The MDP policies induced based on low correlation-based and ensemble feature selection approaches are more effective than a Random yet reasonable policy. Third, no significant improvement was found using stochastic policy execution due to a ceiling effect.
机译:设计和开发智能辅导系统(ITS)的一个重要目标是拥有一个能够在短期内对学生的行为做出自适应反应并在长期内有效地提高他们的学习成绩的系统。诱导实现这一目标的有效教学策略是一项重大挑战。为了解决这一挑战,我们通过四个实验探索了马尔可夫决策过程(MDP)框架的三个方面。这三个方面是:1)奖励功能,检测立即和延迟奖励对政策有效性的影响; 2)状态表示,探索用于表示MDP状态空间的基于ECR,基于相关和整体特征选择的方法; 3)政策执行,研究随机和确定性政策执行对学习的有效性。这项工作最重要的结果是,在我们的实验中存在一种能力倾向-治疗互动(ATI)效应:政策对特定类型的学生(而不是整个人口)产生了显着不同的影响。我们将对政策敏感的学生称为响应小组。以下所有结果均基于响应组。首先,我们发现,即时奖励比延迟奖励可以促进更有效的诱导政策。其次,基于低相关性和集成特征选择方法的MDP策略比随机但合理的策略更有效。第三,由于上限效应,使用随机策略执行未发现任何重大改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号