Markov decision processes are the most popular stochastic sequential decision processes in reinforcement learning for representing the framework of interactions between an agent and an environment. We frequently regard the Markov decision process as a stationary and ergodic process, but most stochastic sequential decision processes arising in reinforcement learning are in fact, not necessarily Markovian, stationary, or ergodic. In this paper, we show that an information-spectrum property plays an important role in return maximization in more general processes than stationary and ergodic Markov decision processes. We also present a class of stochastic sequential decision processes with the necessary condition for return maximization. We provide several examples of best sequences in terms of return maximization in the class.
展开▼