...
首页> 外文期刊>Wireless personal communications: An Internaional Journal >Reinforcement Learning Based on Contextual Bandits for Personalized Online Learning Recommendation Systems
【24h】

Reinforcement Learning Based on Contextual Bandits for Personalized Online Learning Recommendation Systems

机译:基于个性化在线学习推荐系统的上下文匪徒的强化学习

获取原文
获取原文并翻译 | 示例
           

摘要

Personalized online learning has been significantly adopted in recent years and become a potential instructional strategy in online learning. The promising way to provide personalized online learning is personalized recommendation by navigating students to suitable learning contents at the right time. However, this is a nontrivial problem as the learning environments are considered as a high degree of flexibility as students independently learn according to their characteristics, and situations. Existing recommendation methods do not work effectively in such environment. Therefore, our objective of this study is to provide personalized dynamic and continuous recommendation for online learning systems. We propose the method that is based on the contextual bandits and reinforcement learning problems which work effectively in a dynamic environment. Moreover, we propose to use the past student behaviors and current student state as the contextual information to create the policy for the reinforcement agent to make the optimal decision. We deploy real data from an online learning system to evaluate our proposed method. The proposed method is compared with the well-known methods in reinforcement learning problems, i.e. epsilon-greedy, greedy optimistic initial value, and upper bound confidence methods. The results depict that our proposed method significantly performs better than those benchmarking methods in our case test.
机译:个性化在线学习近年来被广泛采用,并成为在线学习中一种潜在的教学策略。提供个性化在线学习的一种有希望的方式是通过在正确的时间引导学生选择合适的学习内容进行个性化推荐。然而,这是一个不寻常的问题,因为学习环境被认为是高度灵活的,因为学生根据自己的特点和情况独立学习。现有的推荐方法在这种环境下无法有效地工作。因此,本研究的目的是为在线学习系统提供个性化、动态和持续的推荐。我们提出了一种基于上下文盗贼和强化学习问题的方法,该方法在动态环境中有效地工作。此外,我们建议使用过去的学生行为和当前的学生状态作为上下文信息来创建策略,以便强化代理做出最佳决策。我们部署了在线学习系统中的真实数据来评估我们提出的方法。将该方法与强化学习问题中的著名方法,即epsilon贪婪法、贪婪乐观初值法和上界置信度法进行了比较。结果表明,在我们的案例测试中,我们提出的方法明显优于那些基准测试方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号