首页> 外文会议>AISB Convention >Learning by Observation: Comparison of Three Methods of Embedding Mentor's Knowledge in Reinforcement Learning Algorithms
【24h】

Learning by Observation: Comparison of Three Methods of Embedding Mentor's Knowledge in Reinforcement Learning Algorithms

机译:观察学习:三种嵌入导师知识中的三种方法比较钢筋学习算法

获取原文

摘要

Using knowledge of already successfully functioning agents can help to avoid expensive exploration that is so vital to some domains of reinforcement learning. Three methods to embed mentor's knowledge are proposed: initialization of Q function, reward shaping and implementing mentor's decisions in a separate action-value function. The speed of convergence of these methods in combination with Q-learning algorithm with different amount of information on mentor's decisions and their robustness to the quality of mentor are compared on four domains from the benchmarks for testing and comparing reinforcement learning algorithms "Reinforcement Learning Benchmarks and Bake-offs".
机译:利用已经成功的运作代理商的知识可以帮助避免昂贵的探索,这对一些加强学习领域这么至关重要。提出了三种嵌入了导师知识的方法:初始化Q函数,奖励塑造和在单独的动作值函数中实现导师的决策。这些方法的收敛速度与具有不同信息的Q学习算法以及对导师的质量不同的Q学习算法以及对导师的质量的鲁棒性进行了比较,用于测试和比较强化学习算法“加强学习基准和钢筋烘焙“。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号