首页> 外文会议>European Starting AI Researcher Symposium >Solving MDPs with Unknown Rewards Using Nondominated Vector-Valued Functions
【24h】

Solving MDPs with Unknown Rewards Using Nondominated Vector-Valued Functions

机译:使用NondoMizated Vector Ventrue函数解决具有未知奖励的MDP

获取原文

摘要

This paper addresses vectorial form of Markov Decision Processes (MDPs) to solve MDPs with unknown rewards. Our method to find optimal strategies is based on reducing the computation to the determination of two separate polytopes. The first one is the set of admissible vector-valued functions and the second is the set of admissible weight vectors. Unknown weight vectors are discovered according to an agent with a set of preferences. Contrary to most existing algorithms for reward-uncertain MDPs, our approach does not require interactions with user during optimal policies generation. Instead, we use a variant of approximate value iteration on vectorial value MDPs based on classifying advantages, that allows us to approximate the set of non-dominated policies regardless of user preferences. Since any agent's optimal policy comes from this set, we propose an algorithm for discovering in this set an approximated optimal policy according to user priorities while narrowing interactively the weight polytope.
机译:本文涉及马尔可夫决策过程(MDP)的矢量形式,以解决具有未知奖励的MDP。我们寻找最佳策略的方法是基于将计算减少到确定两个单独的多台面。第一个是可允许的矢量值函数的集合,第二组是可允许的重量向量集。根据具有一组偏好的代理发现未知的重量载体。与奖励 - 不确定MDP的大多数现有算法相反,我们的方法不需要在最佳策略生成期间与用户交互。相反,我们使用基于分类优势的矢量值MDPS上的近似值迭代的变体,这允许我们近似于用户偏好而近似非主导策略。由于任何代理的最佳策略来自该集合,我们提出了一种用于在该设置中发现的算法,根据用户优先级在交互式的重量多晶络相互缩小时,根据用户优先级地发现近似最佳策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号