CostNet: An End-to-End Framework for Goal-Directed Reinforcement Learning

机译：CostNet：用于目标导向的强化学习的端到端框架

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement Learning (RL) is a general framework concerned with an agent that seeks to maximize rewards in an environment. The learning typically happens through trial and error using explorative methods, such as ∈-greedy. There are two approaches, model-based and model-free reinforcement learning, that show concrete results in several disciplines. Model-based RL learns a model of the environment for learning the policy while model-free approaches are fully explorative and exploitative without considering the underlying environment dynamics. Model-free RL works conceptually well in simulated environments, and empirical evidence suggests that trial and error lead to a near-optimal behavior with enough training. On the other hand, model-based RL aims to be sample efficient, and studies show that it requires far less training in the real environment for learning a good policy. A significant challenge with RL is that it relies on a well-defined reward function to work well for complex environments and such a reward function is challenging to define. Goal-Directed RL is an alternative method that learns an intrinsic reward function with emphasis on a few explored trajectories that reveals the path to the goal state. This paper introduces a novel reinforcement learning algorithm for predicting the distance between two states in a Markov Decision Process. The learned distance function works as an intrinsic reward that fuels the agent's learning. Using the distance-metric as a reward, we show that the algorithm performs comparably to model-free RL while having significantly better sample-efficiently in several test environments.

机译：强化学习（RL）是与寻求最大化环境中奖励的代理人的一般框架。学习通常通过使用探索方法的试验和错误发生，例如∈贪婪。有两种方法，基于模型和无模型的增强学习，该学习显示了具体的几个学科。基于模型的RL了解用于学习政策的环境模型，而无模型方法是完全探索和剥削的，而不考虑潜在的环境动态。无模型RL在模拟环境中概念性地工作，经验证据表明，试验和错误导致具有足够培训的近乎最佳行为。另一方面，基于模型的RL旨在是样本效率，研究表明它需要更少的培训，以学习良好的政策。与RL的重大挑战是，它依赖于定义良好的奖励功能，以适应复杂的环境，并且这种奖励函数是具有挑战性的。目标导向的RL是一种替代方法，可以了解内在的奖励功能，重点是一些探索目标状态路径的探索轨迹。本文介绍了一种新型加强学习算法，用于预测Markov决策过程中两个状态的距离。学习距离函数作为燃料代理学习的内在奖励。使用距离度量作为奖励，我们表明该算法与无模型RL进行相差，同时在多个测试环境中具有显着更好的样本。

著录项

来源
《SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence》|2020年|94-107|共14页
会议地点
作者
Per-Arne Andersen; Morten Goodwin; Ole-Christoffer Granmo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Reinforcement Learning; Markov decision processes; Neural networks; Representation learning; Goal-directed Reinforcement Learning;

机译：加强学习;马尔可夫决策过程;神经网络;代表学习;目标导向强化学习;
入库时间 2022-08-26 13:58:22

相似文献

外文文献
中文文献
专利

1. Path-Integral-Based Reinforcement Learning Algorithm for Goal-Directed Locomotion of Snake-Shaped Robot [J] . Qi Yongqiang, Yang Hailan, Rong Dan, Discrete dynamics in nature and society . 2021,第a期

机译：基于路径 - 积分的蛇形机器人目标机动的加强学习算法
2. Goal-Directed and Habit-Like Modulations of Stimulus Processing during Reinforcement Learning [J] . Luque David, Beesley Tom, Morris Richard W., The Journal of Neuroscience: The Official Journal of the Society for Neuroscience . 2017,第11期

机译：钢筋学习期间刺激加工的目标导向和习惯调制
3. From Creatures of Habit to Goal-Directed Learners: Tracking the Developmental Emergence of Model-Based Reinforcement Learning [J] . Decker Johannes H., Otto A. Ross, Daw Nathaniel D., Psychological science: a journal of the American Psychological Society . 2016,第6期

机译：从习惯的生物到目标导向的学习者：跟踪基于模型的强化学习的发展
4. CostNet: An End-to-End Framework for Goal-Directed Reinforcement Learning [C] . Per-Arne Andersen, Morten Goodwin, Ole-Christoffer Granmo SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence . 2020

机译：CostNet：用于目标导向的强化学习的端到端框架
5. End-to-End Machine Learning Frameworks for Medicine: Data Imputation, Model Interpretation and Synthetic Data Generation [D] . Yoon, Jinsung. 2020

机译：医学终端到底机学习框架：数据归档，模型解释和合成数据生成
6. Goal-Directed and Habit-Like Modulations of Stimulus Processing during Reinforcement Learning [O] . David Luque, Tom Beesley, Richard W. Morris, 2017

机译：强化学习过程中刺激处理的目标定向和习惯性调制
7. Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning [O] . Zhao, Tiancheng, Eskenazi, Maxine 2016

机译：针对对话状态跟踪和管理的端到端学习使用深度强化学习

CostNet: An End-to-End Framework for Goal-Directed Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅