首页> 外文会议>National Conference on Artificial Intelligence >Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression
【24h】

Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression

机译:通过基于知识的内核回归给予加强学习者的首选行动的建议

获取原文
获取外文期刊封面目录资料

摘要

We present a novel formulation for providing advice to a reinforcement learner that employs support-vector regression as its function approximator. Our new method extends a recent advice-giving technique, called Knowledge-Based Kernel Regression (KBKR), that accepts advice concerning a single action of a reinforcement learner. In KBKR, users can say that in some set of states, an action's value should be greater than some linear expression of the current state. In our new technique, which we call Preference KBKR (Pref-KBKR), the user can provide advice in a more natural manner by recommending that some action is preferred over another in the specified set of states. Specifying preferences essentially means that users are giving advice about policies rather than Q values, which is a more natural way for humans to present advice. We present the motivation for preference advice and a proof of the correctness of our extension to KBKR. In addition, we show empirical results that our method can make effective use of advice on a novel reinforcement-learning task, based on the RoboCup simulator, which we call Breakaway. Our work demonstrates the significant potential of advice-giving techniques for addressing complex reinforcement learning problems, while further demonstrating the use of support-vector regression for reinforcement learning.
机译:我们提出了一种新的配方,用于向加强学习者提供建议,该钢筋使用支持矢量回归作为其功能近似器。我们的新方法扩展了最近的咨询技术,称为知识的内核回归(KBKR),接受了有关加强学习者的单一动作的建议。在KBKR中,用户可以在某些状态下说,动作的值应大于当前状态的某些线性表达式。在我们新的技术中,我们呼叫偏好KBKR(Pref-Kbkr),用户可以通过推荐在指定的状态集中的另一个操作中优先于另一个操作提供建议。专门指定首选项意味着用户正在提供有关策略而不是Q值的建议,这是人类提供建议的更自然的方式。我们展示了偏好建议的动机和我们将延伸到KBKR的正确性证明。此外,我们展示了我们的方法可以有效地利用新颖的加强学习任务的建议,基于Robocup模拟器,我们称之为突破。我们的作品展示了咨询提供技术的重要潜力,以解决复杂的增强学习问题,同时进一步展示了对增强学习的支持 - 向量回归的使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号