首页> 外文期刊>Computers & operations research >Risk-sensitive control of Markov decision processes: A moment-based approach with target distributions
【24h】

Risk-sensitive control of Markov decision processes: A moment-based approach with target distributions

机译:马尔可夫决策过程的风险敏感控制:基于时刻的目标分布方法

获取原文
获取原文并翻译 | 示例

摘要

In many revenue management applications risk-averse decision-making is crucial. In dynamic settings, however, it is challenging to find the right balance between maximizing expected rewards and minimizing various kinds of risk. In existing approaches utility functions, chance constraints, or (conditional) value at risk considerations are used to influence the distribution of rewards in a preferred way. Nevertheless, common techniques are not flexible enough and typically numerically complex. In our model, we exploit the fact that a distribution is characterized by its mean and higher moments. We present a multi-valued dynamic programming heuristic to compute risk-sensitive feedback policies that are able to directly control the moments of future rewards. Our approach is based on recursive formulations of higher moments and does not require an extension of the state space. Finally, we propose a self-tuning algorithm, which allows to identify feedback policies that approximate predetermined (risk-sensitive) target distributions. We illustrate the effectiveness and the flexibility of our approach for different dynamic pricing scenarios. (C) 2020 Elsevier Ltd. All rights reserved.
机译:在许多收入管理应用中,风险厌恶决策至关重要。然而,在动态设置中,在最大化预期奖励和最小化各种风险之间找到合适的平衡是具有挑战性的。在现有方法中,风险考虑因素的机会限制或(条件)值用于以优选的方式影响奖励的分布。然而,常见技术不够灵活,通常是数值复杂的。在我们的模型中,我们利用了分布的特征是其平均值和更高的时刻。我们提出了一种多价动态编程启发式,可以计算能够直接控制未来奖励的时刻的风险敏感的反馈策略。我们的方法是基于更高时刻的递归制剂,并且不需要延伸状态空间。最后,我们提出了一种自调整算法,其允许识别近似预定(风险敏感)目标分布的反馈策略。我们说明了我们对不同动态定价方案的方法的有效性和灵活性。 (c)2020 elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号