A modification of gradient policy in reinforcement learning procedure

机译：加固学习过程中渐变政策的修改

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The gradient of a scalar function is frequently used in various areas of mathematics. In informatics it can be used, for example, in the process of learning procedure of many control systems. The key observation is that gradient, if it is a non-zero vector, is a vector in the direction of greatest rate of the scalar function. In this contribution we show a method how to determine the direction(s) even if the gradient is zero vector. We show that this can be done with the knowledge which students have it their stage of study.

机译：标量函数的梯度经常用于数学领域。在信息学中，可以使用例如许多控制系统的学习过程的过程中。关键观察是梯度，如果是非零向量，则是标量函数最大速率方向的矢量。在该贡献中，我们示出了如何确定即使梯度为零向量的方向的方法。我们表明，这可以通过学生将其学习阶段的知识完成。

著录项

来源
《International Conference on Interactive Collaborative Learning;International Conference on Engineering Pedagogy》|2012年||共2页
会议地点
作者
Abas Marcel; Skripcak Tomas;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词
control system; direction of greatest rate; gradient policy; neuron networks;

机译：控制系统;最大速度的方向;渐变政策;神经元网络;

相似文献

外文文献
中文文献
专利

1. An Algorithm of Policy Gradient Reinforcement Learning with a Fuzzy Controller in Policies [J] . Harukazu Igarashi, Seiji Ishihara International Journal of Artificial Intelligence and Expert Systems (IJAE) . 2013,第1期

机译：策略中带有模糊控制器的策略梯度强化学习算法
2. Deep reinforcement learning collision avoidance using policy gradient optimisation and Q-learning [J] . Shady A. Maged, Bishoy H. Mikhail International journal of computational vision and robotics . 2020,第3期

机译：使用政策梯度优化和Q-Learning避免深增强学习碰撞
3. Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy [J] . van Rooijen J. C., Grondman I., Babuska R. Mechatronics: The Science of Intelligent Machines . 2014,第8期

机译：使用基于价值梯度的策略进行实时运动控制的无学习率强化学习
4. A modification of gradient policy in reinforcement learning procedure [C] . Abas Marcel, Skripcak Tomas 2012 15th International Conference on Interactive Collaborative Learning. . 2012

机译：强化学习过程中梯度策略的修改
5. Explaining Collective Behavior with Dynamical Systems: Spatial Gradient Sensing in Eukaryotic Chemotaxis and Learning Dynamics in Multiagent Reinforcement Learning [D] . Shams, Daniel . 2019

机译：用动力系统解释集体行为：多核化趋化性的空间梯度传感和多核强化学习中的学习动态
6. Correction: Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail [O] . Eleni Vasilaki, Nicolas Frémaux, Robert Urbanczik, 2009

机译：更正：在连续状态和动作空间中基于峰值的强化学习：当策略梯度方法失败时
7. Policy Gradient Reinforcement Learning with a Fuzzy Controller for Policy: Decision Making in RoboCup Soccer Small Size League [O] . Masaya SUGIMOTO, Harukazu IGARASHI, Seiji ISHIHARA, 2014

机译：政策模糊控制器的政策梯度加固学习：Robocup足球小型联赛中的决策

A modification of gradient policy in reinforcement learning procedure

摘要

著录项

相似文献

相关主题

期刊订阅