【24h】

Deep Feedback Learning

机译:深度反馈学习

获取原文
获取外文期刊封面目录资料

摘要

An agent acting in an environment aims to minimise uncertainties so that being attacked can be predicted, and rewards are not only found by chance. These events define an error signal which can be used to improve performance. In this paper we present a new algorithm where an error signal from a reflex trains a novel deep network: the error is propagated forwards through the network from its input to its output, in order to generate pro-active actions. We demonstrate the algorithm in two scenarios: a lst-person shooter game and a driving car scenario, where in both cases the network develops strategies to become pro-active.
机译:在环境中行动的特工旨在最大程度地减少不确定性,以便可以预测被攻击的程度,并且不仅会偶然发现回报。这些事件定义了可用于提高性能的错误信号。在本文中,我们提出了一种新算法,其中来自反射的错误信号会训练一个新型的深层网络:错误会通过网络从其输入传播到其输出,从而生成主动行为。我们在两种场景中演示该算法:第一人称射击游戏和驾驶汽车场景,在这两种情况下,网络都制定了主动策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号