The authors consider a specific multidimensional stochastic approximation scheme of the Robbins-Monro type that naturally arises in the study of steering policies for Markov decision processes. The usual convergence results (in the almost sure sense) do not seem to apply for this simple scheme. Almost sure convergence is established by an indirect argument that blends standard results on stochastic approximations with a version of the law of large number for martingale differences. These convergence properties provide an alternative proof for some of the properties of steering policies.
展开▼