A Sojourn-Based Approach to Semi-Markov Reinforcement Learning

Giacomo Ascione; Salvatore Cuomo

首页> 外文期刊>Journal of scientific computing >A Sojourn-Based Approach to Semi-Markov Reinforcement Learning

【24h】

A Sojourn-Based Approach to Semi-Markov Reinforcement Learning

机译：A Sojourn-Based Approach to Semi-Markov Reinforcement Learning

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

Abstract In this paper we introduce a new approach to discrete-time semi-Markov decision processes based on the sojourn time process. Different characterizations of discrete-time semi-Markov processes are exploited and decision processes are constructed by their means. With this new approach, the agent is allowed to consider different actions depending also on the sojourn time of the process in the current state. A numerical method based on Q-learning algorithms for finite horizon reinforcement learning and stochastic recursive relations is investigated. Finally, we consider two toy examples: one in which the reward depends on the sojourn-time, according to the gambler’s fallacy; the other in which the environment is semi-Markov even if the reward function does not depend on the sojourn time. These are used to carry on some numerical evaluations on the previously presented Q-learning algorithm and on a different naive method based on deep reinforcement learning.

著录项

来源
《Journal of scientific computing》 |2022年第2期|1-44|共44页
作者
Giacomo Ascione; Salvatore Cuomo;
展开▼
作者单位

Universitá degli Studi di Napoli Federico II;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类
关键词
Semi-Markov chains; Dynamic Programming Principle; Q-learning algorithms; Optimal policy;

A Sojourn-Based Approach to Semi-Markov Reinforcement Learning

摘要

著录项

相关主题

期刊订阅