A temporal difference method for multi-objective reinforcement learning

Ruiz-Montiel Manuela; Mandow Lawrence; Perez-de-la-Cruz Jose-Luis

首页> 外文期刊>Neurocomputing >A temporal difference method for multi-objective reinforcement learning

【24h】

A temporal difference method for multi-objective reinforcement learning

机译：多目标强化学习的时差方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This work describes MPQ-learning, an algorithm that approximates the set of all deterministic non dominated policies in multi-objective Markov decision problems, where rewards are vectors and each component stands for an objective to maximize. MPQ-learning generalizes directly the ideas of Q-learning to the multi-objective case. It can be applied to non-convex Pareto frontiers and finds both supported and unsupported solutions. We present the results of the application of MPQ-learning to some benchmark problems. The algorithm solves successfully these problems, so showing the feasibility of this approach. We also compare MPQ-learning to a standard linearization procedure that computes only supported solutions and show that in some cases MPQ-learning can be as effective as the scalarization method. (C) 2017 Elsevier B.V. All rights reserved.

机译：这项工作描述了MPQ学习，一种近似于多目标Markov决策问题中所有确定性非支配策略的集合的算法，其中奖励是向量，每个分量代表一个最大化的目标。 MPQ学习将Q学习的思想直接推广到多目标案例。它可以应用于非凸面的帕累托边界，并找到受支持和不受支持的解决方案。我们介绍了将MPQ学习应用于某些基准测试问题的结果。该算法成功解决了这些问题，因此证明了该方法的可行性。我们还将MPQ学习与仅计算受支持解决方案的标准线性化过程进行了比较，并表明在某些情况下MPQ学习与标量方法一样有效。（C）2017 Elsevier B.V.保留所有权利。

著录项

来源
《Neurocomputing》 |2017年第8期|15-25|共11页
作者
Ruiz-Montiel Manuela; Mandow Lawrence; Perez-de-la-Cruz Jose-Luis;
展开▼
作者单位

Univ Malaga, Dept Lenguajes & Ciencias Computac, Andalucia Tech, Malaga, Spain;

Univ Malaga, Dept Lenguajes & Ciencias Computac, Andalucia Tech, Malaga, Spain;

Univ Malaga, Dept Lenguajes & Ciencias Computac, Andalucia Tech, Malaga, Spain;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Reinforcement learning; Multi-objective optimization; MOMDPs; Q-leaming;

机译：强化学习;多目标优化;MOMDP;Q学习;

相似文献

外文文献
中文文献
专利

1. Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning [J] . Naoto Horie, Tohgoroh Matsui, Koichi Moriyama, Artificial life and robotics . 2019,第3期

机译：多目标安全强化学习：多目标强化学习与安全强化学习之间的关系
2. Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning [J] . Shimon Whiteson, Matthew E. Taylor, Peter Stone Autonomous agents and multi-agent systems . 2010,第1期

机译：时间差异经验表现的关键因素和强化学习的进化方法
3. Integrating Temporal Difference Methods and Self-Organizing Neural Networks for Reinforcement Learning With Delayed Evaluative Feedback [J] . Tan A.-H., Lu N., Xiao D. IEEE Transactions on Neural Networks . 2008,第2期

机译：整合时差方法和自组织神经网络用于延迟评估反馈的强化学习
4. Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning [C] . Kristopher De Asis, Alan Chan, Silviu Pitis, AAAI Conference on Artificial Intelligence . 2020

机译：稳定加固学习的固定地平线差分方法
5. Model-Based Reinforcement Learning for Cooperative Multi-Agent Planning: Exploiting Hierarchies, Bias, and Temporal Sampling [D] . Ma, Aaron. 2020

机译：基于模型的合作多智能经纪人规划的强化学习：利用层次结构，偏见和时间采样
6. PNAS Plus: Contrasting temporal difference and opportunity cost reinforcement learning in an empirical money-emergence paradigm [O] . Germain Lefebvre, Aurélien Nioche, Sacha Bourgeois-Gironde, 2018

机译：PNAS Plus：在经验性货币涌现范例中对比时差和机会成本强化学习
7. A temporal difference method for multi-objective reinforcement learning [O] . Manuela Ruiz-Montiel, Lawrence Mandow, José-Luis Pérez-de-la-Cruz 2017

机译：多目标强力学习的时间差分方法

A temporal difference method for multi-objective reinforcement learning

摘要

著录项

相似文献

相关主题

期刊订阅