MDPs with Non-Deterministic Policies

机译：具有不确定性策略的MDP

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Markov Decision Processes (MDPs) have been extensively studied and used in the context of planning and decision-making, and many methods exist to find the optimal policy for problems modelled as MDPs. Although finding the optimal policy is sufficient in many domains, in certain applications such as decision support systems where the policy is executed by a human (rather than a machine), finding all possible near-optimal policies might be useful as it provides more flexibility to the person executing the policy. In this paper we introduce the new concept of non-deterministic MDP policies, and address the question of finding near-optimal non-deterministic policies. We propose two solutions to this problem, one based on a Mixed Integer Program and the other one based on a search algorithm. We include experimental results obtained from applying this framework to optimize treatment choices in the context of a medical decision support system.

机译：马尔可夫决策过程（MDP）已在计划和决策环境中进行了广泛的研究和使用，并且存在许多方法来找到以MDP为模型的问题的最优策略。尽管在许多领域中找到最佳策略就足够了，但是在某些应用程序中，例如由人（而不是机器）执行策略的决策支持系统，找到所有可能的近乎最优的策略可能会很有用，因为它为用户提供了更大的灵活性。执行政策的人。在本文中，我们介绍了非确定性MDP策略的新概念，并解决了寻找接近最优的非确定性策略的问题。针对此问题，我们提出了两种解决方案，一种基于混合整数程序，另一种基于搜索算法。我们包括通过应用此框架在医疗决策支持系统中优化治疗选择而获得的实验结果。

著录项

期刊名称 other
作者
Mahdi Milani Fard; Joelle Pineau;
展开▼
作者单位

展开▼
年(卷),期 -1(21),-1
年度 -1
页码 1065–1073
总页数 16
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Policy Evaluation in Continuous MDPs With Efficient Kernelized Gradient Temporal Difference [J] . Koppel Alec, Warnell Garrett, Stump Ethan, IEEE Transactions on Automatic Control . 2021,第4期

机译：连续MDP的政策评估，具有高效的脑级梯度时间差异
2. Cross Entropy Optimization of Action Modification Policies for Continuous-Valued MDPs ? [J] . Kamelia Mirkamali, Lucian Bu?oniu IFAC PapersOnLine . 2020,第2期

机译：跨熵优化用于连续值MDPS的动作修改策略？
3. SUFFICIENCY OF DETERMINISTIC POLICIES FOR ATOMLESS DISCOUNTED AND UNIFORMLY ABSORBING MDPs WITH MULTIPLE CRITERIA [J] . Feinberg Eugene A., Piunovskiy Alexey SIAM Journal on Control and Optimization . 2019,第1期

机译：具有多个标准的无原子折扣和均匀吸收MDP的确定性政策的充分性
4. Learning Depth-First Search: A Unified Approach to Heuristic Search in Deterministic and Non-Deterministic Settings, and its application to MDPs [C] . Blai Bonet, Hector Geffner International Conference on Automated Planning and Scheduling(ICAPS 2006); 2006; . 2006

机译：学习深度优先搜索：确定性和非确定性环境中启发式搜索的统一方法及其在MDP中的应用
5. Lip Synchronization for ECA Rendering with Self-Adjusted POMDP Policies [D] . Szucs, Tristan. 2019

机译：ECA渲染与自我调整POMDP政策的唇部同步
6. GMDPtoolbox: A Matlab library for designing spatial management policies. Application to the long-term collective management of an airborne disease [O] . Marie-Josée Cros, Jean-Noël Aubertot, Nathalie Peyrard, -1

机译：GMDPtoolbox：用于设计空间管理策略的Matlab库。在空气传播疾病的长期集体管理中的应用
7. Policy Evaluation in Continuous MDPs With Efficient Kernelized Gradient Temporal Difference [O] . Alec Koppel, Garrett Warnell, Ethan Stump, 2021

机译：连续MDP的政策评估，具有高效的脑级梯度时间差异

MDPs with Non-Deterministic Policies

摘要

著录项

相似文献

相关主题

期刊订阅