Solving MDPs with Unknown Rewards Using Nondominated Vector-Valued Functions

机译：使用NondoMizated Vector Ventrue函数解决具有未知奖励的MDP

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper addresses vectorial form of Markov Decision Processes (MDPs) to solve MDPs with unknown rewards. Our method to find optimal strategies is based on reducing the computation to the determination of two separate polytopes. The first one is the set of admissible vector-valued functions and the second is the set of admissible weight vectors. Unknown weight vectors are discovered according to an agent with a set of preferences. Contrary to most existing algorithms for reward-uncertain MDPs, our approach does not require interactions with user during optimal policies generation. Instead, we use a variant of approximate value iteration on vectorial value MDPs based on classifying advantages, that allows us to approximate the set of non-dominated policies regardless of user preferences. Since any agent's optimal policy comes from this set, we propose an algorithm for discovering in this set an approximated optimal policy according to user priorities while narrowing interactively the weight polytope.

机译：本文涉及马尔可夫决策过程（MDP）的矢量形式，以解决具有未知奖励的MDP。我们寻找最佳策略的方法是基于将计算减少到确定两个单独的多台面。第一个是可允许的矢量值函数的集合，第二组是可允许的重量向量集。根据具有一组偏好的代理发现未知的重量载体。与奖励 - 不确定MDP的大多数现有算法相反，我们的方法不需要在最佳策略生成期间与用户交互。相反，我们使用基于分类优势的矢量值MDPS上的近似值迭代的变体，这允许我们近似于用户偏好而近似非主导策略。由于任何代理的最佳策略来自该集合，我们提出了一种用于在该设置中发现的算法，根据用户优先级在交互式的重量多晶络相互缩小时，根据用户优先级地发现近似最佳策略。

著录项

来源
《European Starting AI Researcher Symposium》|2016年|223p|共12页
会议地点
作者
Pegah ALIZADEH; Yann CHEVALEYRE; Francois LEVY;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Reward-Uncertain MDPs; Policy Iteration; Non Dominated Vector-Valued Functions; Advantages; Reward Elicitation;

机译：奖励 - 不确定的MDP;政策迭代;非主导的矢量值函数;优势;奖励诱因;

相似文献

外文文献
中文文献
专利

1. OPTIMAL STOPPING PROBLEM WITH A VECTOR-VALUED REWARD FUNCTION [J] . Cloud Makasu Numerical Functional Analysis and Optimization . 2014,第4a6期

机译：具有矢量值奖励功能的最佳停止问题
2. Vector-Valued Functions Generated by the Operator of Finite Order and Their Application to Solving Operator Equations in Locally Convex Spaces [J] . S. N. Man’ko Russian mathematics . 2018,第3期

机译：由有限顺序运算符生成的矢量值函数及其在局部凸空间中解决操作员方程的应用
3. Synthetic cathinone MDPV enhances reward function through purinergic P2X7 receptor-dependent pathway and increases P2X7 gene expression in nucleus accumbens [J] . Gentile Taylor A., Simmons Steven J., Tallarida Christopher S., Drug and alcohol dependence . 2019,第期

机译：合成的Cathinone MDPV通过嘌呤能P2X7受体依赖性途径增强奖励功能，并增加了核心的P2X7基因表达
4. Solving MDPs with Unknown Rewards Using Nondominated Vector-Valued Functions [C] . Pegah ALIZADEH, Yann CHEVALEYRE, Francois LEVY European Starting AI Researcher Symposium . 2016

机译：使用NondoMizated Vector Ventrue函数解决具有未知奖励的MDP
5. Point-Based POMDP Solvers: Survey and Comparative Analysis. [D] . Kaplow, Robert. 2010

机译：基于点的POMDP解决方案：调查和比较分析。
6. Self-administration of the synthetic cathinone MDPV enhances reward function via a nicotinic receptor dependent mechanism [O] . Jean R. Geste, Marjory Pompilus, Marcelo Febo, -1

机译：合成卡西酮MDPV的自我给药通过烟碱样受体依赖性机制增强奖赏功能
7. A.: Solving relational MDPs with exogenous events and additive rewards [O] . Saket Joshi, Roni Khardon, Prasad Tadepalli, 2016

机译：答：解决具有外生事件和附加奖励的关系mDp

Solving MDPs with Unknown Rewards Using Nondominated Vector-Valued Functions

摘要

著录项

相似文献

相关主题

期刊订阅