Decision-making in uncertain environments is a basic problem in the area of artificial intelligence, and Markov decision processes (MDPs) have become very popular for modeling non-deterministic planning problems with full observability. Specifically, an MDP assumes discrete states and discrete actions, and can be viewed as stochastic automata where an agent's actions have uncertain effects. Such uncertain action outcomes induce stochastic transitions between states. The expected value of a chosen action is a function of the transitions it induces. On executing the action, the agent receives a reward and also causes a change in the state of the environment. The objective of the agent is to perform actions in order to maximize the cumulative future reward over a period of time. In practice, the Value Iteration (VI) is probably the most famous and most widely used method for solving the MDPs.
展开▼