In this paper we consider the weighted reward discounted Markov Decision Processes or MDP's, for short, with perturbation. We give the proof of existence of an optimal simple ultimately deterministic policy for process Γ_0(β_1 • • • ,β_k). We also prove that there exists a δ-optimal simple ultimately deterministic policy in the perturbed weighted MDP, for all d ∈ [0,∈*). Finally we prove the following result: if χ is an optimal policy of Γ_d(β_1, • ••, β_K), then for any δ > 0 there exists an ε(δ)-neighbourhood B(D) such that when D_1∈ B(D), π is a δ-optimal policy of Γ_(D_1)(β_1, •••,β_K).
展开▼