Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function

Tamatsukuri Akihiro; Takahashi Tatsuji

首页> 外文期刊>BioSystems >Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function

【24h】

Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function

机译：保证令人满意和有限的遗憾：认知满足价值函数的分析

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

As reinforcement learning algorithms are being applied to increasingly complicated and realistic tasks, it is becoming increasingly difficult to solve such problems within a practical time frame. Hence, we focus on a satisficing strategy that looks for an action whose value is above the aspiration level (analogous to the break-even point), rather than the optimal action. In this paper, we introduce a simple mathematical model called risk-sensitive satisficing (RS) that implements a satisficing strategy by integrating risk-averse and risk-prone attitudes under the greedy policy. We apply the proposed model to the K-armed bandit problems, which constitute the most basic class of reinforcement learning tasks, and prove two propositions. The first is that RS is guaranteed to find an action whose value is above the aspiration level. The second is that the regret (expected loss) of RS is upper bounded by a finite value, given that the aspiration level is set to an "optimal level" so that satisficing implies optimizing. We confirm the results through numerical simulations and compare the performance of RS with that of other representative algorithms for the K-armed bandit problems.

机译：随着加强学习算法正在应用于越来越复杂和现实的任务，它变得越来越困难解决在实际时间范围内的这种问题。因此，我们专注于寻找一个令人满意的策略，该策略寻找其值高于抽吸级别的行动（类似于休息时间），而不是最佳动作。在本文中，我们介绍了一个简单的数学模型，称为风险敏感的令人满意（RS），通过在贪婪政策下整合风险厌恶和风险态度来实现令人满意的策略。我们将拟议的模型应用于K武装的匪徒问题，这构成了最基本的加强学习任务，并证明了两个命题。首先是RS保证找到一个动作，其值高于抽吸级别。第二种是RS的遗憾（预期损耗）是由有限值的上限，鉴于抽吸级别被设置为“最佳水平”，使得满足意味着优化。我们通过数值模拟确认结果，并将Rs的性能与其他代表性算法的性能进行比较，以获得K武装的强盗问题。

著录项

来源
《BioSystems》 |2019年第2019期|共8页
作者
Tamatsukuri Akihiro; Takahashi Tatsuji;
展开▼
作者单位

Tokyo Denki Univ Grad Sch Adv Sci &

Engn Hiki Saitama 3500394 Japan;

Tokyo Denki Univ Sch Sci &

Engn Hiki Saitama 3500394 Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类生物科学;
关键词
Satisficing; Decision-making; Multi-armed bandit problems; Reinforcement learning;

机译：令人满意;决策;多武装匪徒问题;加强学习;

相似文献

外文文献
中文文献
专利

1. Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function [J] . Tamatsukuri Akihiro, Takahashi Tatsuji BioSystems . 2019,第期

机译：保证令人满意和有限的遗憾：认知满足价值函数的分析
2. Info-gap robust-satisficing model of foraging behavior: Do foragers optimize or satisfice? [J] . Carmel Y, Ben-Haim Y The American Naturalist: Devoted to the Conceptual Unification of the Biological Sciences . 2005,第5期

机译：觅食行为的信息缺口鲁棒满意模型：觅食者是否优化或满足？
3. Distributed Satisficing MPC With Guarantee of Stability [J] . de Lima Marcelo Lopes, Limon Daniel, Munoz de la Pena David, Automatic Control, IEEE Transactions on . 2016,第2期

机译：稳定的分布式满足MPC
4. Automatic Versus Manual Forwarding in Web Surveys - A Cognitive Load Perspective on Satisficing Responding [C] . Arto Selkaelae, Mario Callegaro, Mick P. Couper International Conference on Social Computing and Social Media;International Conference on Human-Computer Interaction . 2020

机译：网络调查中的自动对手动转发-满足响应的认知负载视角
5. Satisficing in attitude surveys: The impact of cognitive skills, motivation, and task difficulty on response effects. [D] . Narayan, Sowmya Shankar. 1995

机译：态度调查中的满意度：认知技能，动机和任务难度对响应效果的影响。
6. Learning and Satisficing: An Analysis of Sequence Effects in Health Valuation [O] . Benjamin M. Craig, Shannon K. Runge, Kim Rand-Hendriksen, -1

机译：学习和满意度：健康评估中的序列效应分析
7. Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function [O] . Akihiro Tamatsukuri, Tatsuji Takahashi 2019

机译：保证令人满意和有限的遗憾：认知满足价值函数的分析

Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function

摘要

著录项

相似文献

相关主题

期刊订阅