A modification of gradient policy in reinforcement learning procedure

机译：强化学习过程中梯度策略的修改

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The gradient of a scalar function is frequently used in various areas of mathematics. In informatics it can be used, for example, in the process of learning procedure of many control systems. The key observation is that gradient, if it is a non-zero vector, is a vector in the direction of greatest rate of the scalar function. In this contribution we show a method how to determine the direction(s) even if the gradient is zero vector. We show that this can be done with the knowledge which students have it their stage of study.

机译：标量函数的梯度经常在数学的各个领域中使用。在信息学中，例如，它可以用于许多控制系统的学习过程中。关键的观察结果是，如果梯度为非零向量，则它是标量函数最大速率方向上的向量。在本文中，我们展示了一种即使梯度为零向量也如何确定方向的方法。我们证明，这可以通过学生了解其学习阶段的知识来完成。

著录项

来源
《2012 15th International Conference on Interactive Collaborative Learning.》|2012年|p.1-2|共2页
会议地点 Villach(AT);Villach(AT);Villach(AT)
作者
Abas Marcel; Skripcak Tomas;
展开▼
作者单位

Institute of Applied Informatics, Automation and Mathematics, Faculty of Materials Science and Technology in Trnava, Slovak University of Technology in Bratislava, Trnava, Slovak Republic;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;计算技术、计算机技术;
关键词
control system; direction of greatest rate; gradient policy; neuron networks;

机译：控制系统;最大速率方向;梯度策略;神经网络;

相似文献

外文文献
中文文献
专利

1. An Algorithm of Policy Gradient Reinforcement Learning with a Fuzzy Controller in Policies [J] . Harukazu Igarashi, Seiji Ishihara International Journal of Artificial Intelligence and Expert Systems (IJAE) . 2013,第1期

机译：策略中带有模糊控制器的策略梯度强化学习算法
2. Deep reinforcement learning collision avoidance using policy gradient optimisation and Q-learning [J] . Shady A. Maged, Bishoy H. Mikhail International journal of computational vision and robotics . 2020,第3期

机译：使用政策梯度优化和Q-Learning避免深增强学习碰撞
3. Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy [J] . van Rooijen J. C., Grondman I., Babuska R. Mechatronics: The Science of Intelligent Machines . 2014,第8期

机译：使用基于价值梯度的策略进行实时运动控制的无学习率强化学习
4. A modification of gradient policy in reinforcement learning procedure [C] . Abas Marcel, Skripcak Tomas International Conference on Interactive Collaborative Learning;International Conference on Engineering Pedagogy . 2012

机译：加固学习过程中渐变政策的修改
5. Explaining Collective Behavior with Dynamical Systems: Spatial Gradient Sensing in Eukaryotic Chemotaxis and Learning Dynamics in Multiagent Reinforcement Learning [D] . Shams, Daniel . 2019

机译：用动力系统解释集体行为：多核化趋化性的空间梯度传感和多核强化学习中的学习动态
6. Correction: Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail [O] . Eleni Vasilaki, Nicolas Frémaux, Robert Urbanczik, 2009

机译：更正：在连续状态和动作空间中基于峰值的强化学习：当策略梯度方法失败时
7. Policy Gradient Reinforcement Learning with a Fuzzy Controller for Policy: Decision Making in RoboCup Soccer Small Size League [O] . Masaya SUGIMOTO, Harukazu IGARASHI, Seiji ISHIHARA, 2014

机译：政策模糊控制器的政策梯度加固学习：Robocup足球小型联赛中的决策

A modification of gradient policy in reinforcement learning procedure

摘要

著录项

相似文献

相关主题

期刊订阅