首页> 外文会议>International Symposium on Robotics Research >Reachability and Differential Based Heuristics for Solving Markov Decision Processes
【24h】

Reachability and Differential Based Heuristics for Solving Markov Decision Processes

机译:用于解决马尔可夫决策过程的可达性和基于差分的启发式方法

获取原文

摘要

Decision-making in uncertain environments is a basic problem in the area of artificial intelligence, and Markov decision processes (MDPs) have become very popular for modeling non-deterministic planning problems with full observability. Specifically, an MDP assumes discrete states and discrete actions, and can be viewed as stochastic automata where an agent's actions have uncertain effects. Such uncertain action outcomes induce stochastic transitions between states. The expected value of a chosen action is a function of the transitions it induces. On executing the action, the agent receives a reward and also causes a change in the state of the environment. The objective of the agent is to perform actions in order to maximize the cumulative future reward over a period of time. In practice, the Value Iteration (VI) is probably the most famous and most widely used method for solving the MDPs.
机译:在不确定环境中的决策是人工智能领域的基本问题,而马尔可夫决策过程(MDPS)已经非常流行,用于以完全可观察性建模非确定性规划问题。具体地,MDP假设离散状态和离散动作,并且可以被视为随机自动机,其中代理的动作具有不确定的效果。这种不确定的动作结果诱导状态之间的随机转变。所选动作的预期值是它引起的过渡的函数。在执行动作时,代理收到奖励,并导致环境状况发生变化。代理的目标是执行行动,以便在一段时间内最大化累积的未来奖励。在实践中,价值迭代(VI)可能是解决MDP的最着名和最广泛使用的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号