首页> 外文会议>2012 15th International Conference on Interactive Collaborative Learning. >A modification of gradient policy in reinforcement learning procedure
【24h】

A modification of gradient policy in reinforcement learning procedure

机译:强化学习过程中梯度策略的修改

获取原文
获取原文并翻译 | 示例

摘要

The gradient of a scalar function is frequently used in various areas of mathematics. In informatics it can be used, for example, in the process of learning procedure of many control systems. The key observation is that gradient, if it is a non-zero vector, is a vector in the direction of greatest rate of the scalar function. In this contribution we show a method how to determine the direction(s) even if the gradient is zero vector. We show that this can be done with the knowledge which students have it their stage of study.
机译:标量函数的梯度经常在数学的各个领域中使用。在信息学中,例如,它可以用于许多控制系统的学习过程中。关键的观察结果是,如果梯度为非零向量,则它是标量函数最大速率方向上的向量。在本文中,我们展示了一种即使梯度为零向量也如何确定方向的方法。我们证明,这可以通过学生了解其学习阶段的知识来完成。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号