In this note we discuss some algorithmic procedures for finding optimal policies of Markov decision chains with respect to various mean variance optimality criteria. To this end, we present formulas for the growth rate and asymptotic behavior of the variance of total cumulative reward. Finally, algorithmic procedures of policy iteration type for finding efficient policies with respect to various mean variance optimality criteria along with computational experience are discussed.
展开▼