首页> 外文期刊>BioSystems >Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function
【24h】

Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function

机译:保证令人满意和有限的遗憾:认知满足价值函数的分析

获取原文
获取原文并翻译 | 示例
           

摘要

As reinforcement learning algorithms are being applied to increasingly complicated and realistic tasks, it is becoming increasingly difficult to solve such problems within a practical time frame. Hence, we focus on a satisficing strategy that looks for an action whose value is above the aspiration level (analogous to the break-even point), rather than the optimal action. In this paper, we introduce a simple mathematical model called risk-sensitive satisficing (RS) that implements a satisficing strategy by integrating risk-averse and risk-prone attitudes under the greedy policy. We apply the proposed model to the K-armed bandit problems, which constitute the most basic class of reinforcement learning tasks, and prove two propositions. The first is that RS is guaranteed to find an action whose value is above the aspiration level. The second is that the regret (expected loss) of RS is upper bounded by a finite value, given that the aspiration level is set to an "optimal level" so that satisficing implies optimizing. We confirm the results through numerical simulations and compare the performance of RS with that of other representative algorithms for the K-armed bandit problems.
机译:随着加强学习算法正在应用于越来越复杂和现实的任务,它变得越来越困难解决在实际时间范围内的这种问题。因此,我们专注于寻找一个令人满意的策略,该策略寻找其值高于抽吸级别的行动(类似于休息时间),而不是最佳动作。在本文中,我们介绍了一个简单的数学模型,称为风险敏感的令人满意(RS),通过在贪婪政策下整合风险厌恶和风险态度来实现令人满意的策略。我们将拟议的模型应用于K武装的匪徒问题,这构成了最基本的加强学习任务,并证明了两个命题。首先是RS保证找到一个动作,其值高于抽吸级别。第二种是RS的遗憾(预期损耗)是由有限值的上限,鉴于抽吸级别被设置为“最佳水平”,使得满足意味着优化。我们通过数值模拟确认结果,并将Rs的性能与其他代表性算法的性能进行比较,以获得K武装的强盗问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号