Performance in Multi-Armed Bandit Tasks in Relation to Ambiguity-Preference Within a Learning Algorithm

Kim Song-Ju; Takahashi Taiki

首页> 外文期刊>Frontiers in Applied Mathematics and Statistics >Performance in Multi-Armed Bandit Tasks in Relation to Ambiguity-Preference Within a Learning Algorithm

【24h】

Performance in Multi-Armed Bandit Tasks in Relation to Ambiguity-Preference Within a Learning Algorithm

机译：学习算法中与歧义偏好相关的多武装强盗任务的性能

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Ellsberg paradox in decision theory posits that people will inevitably choose a known probability of winning over an unknown probability of winning even if the known probability is low. One of prevailing theories which addresses the Ellsberg paradox is known as a??ambiguity-aversiona??. In this study, we investigate the properties of ambiguity-aversion in four distinct types of reinforcement learning algorithms: ucb1-tuned, modified ucb1-tuned, softmax, and tug-of-war. We take as our sample a scenario in which there are two slot machines and each machine dispenses a coin according to a probability that is generated by its own probability density function (PDF). We then investigate the choices of a learning algorithm in such multi-armed bandit tasks. There are different reactions in multi-armed bandit tasks, depending on the ambiguity-preference in the learning algorithms. Notably, we discovered clear performance enhancement related to ambiguity-preference in a learning algorithm. Although this study does not directly address the issue of ambiguity-aversion theory highlighted in Ellsberg paradox, the differences between different learning algorithms suggests that there is room for further study regarding the Ellsberg paradox and decision theory.

机译：决策理论中的埃尔斯伯格悖论认为，即使已知概率很低，人们也会不可避免地选择一个已知的获胜概率而不是一个未知的获胜概率。解决埃尔斯伯格悖论的一种流行理论被称为“模糊性厌恶”。在这项研究中，我们研究了四种不同类型的强化学习算法的歧义厌恶特性：ucb1调整，改进的ucb1调整，softmax和拔河。我们以一个场景为例，其中有两个老虎机，每个老虎机根据自己的概率密度函数（PDF）生成的概率分配硬币。然后，我们研究在这种多臂匪徒任务中学习算法的选择。根据学习算法中的歧义偏好，在多武装匪徒任务中会有不同的反应。值得注意的是，我们在学习算法中发现了与歧义偏好相关的明显性能增强。尽管本研究并未直接解决Ellsberg悖论中强调的歧义规避理论的问题，但不同学习算法之间的差异表明，关于Ellsberg悖论和决策理论的研究仍有空间。

著录项

来源
《Frontiers in Applied Mathematics and Statistics》 |2018年第1期|共页
作者
Kim Song-Ju; Takahashi Taiki;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类统计学;
关键词
Decision MakingEllsberg paradoxambiguity aversionreinforcement learningmachine learningartificial intelligencenatural computingneuroeconomics;

机译：决策制定埃尔斯伯格悖论模糊性厌恶强化学习机器学习人工智能自然计算神经经济学;
入库时间 2022-08-18 11:13:22

相似文献

外文文献
中文文献
专利

1. Rethinking the Gold Standard With Multi-armed Bandits: Machine Learning Allocation Algorithms for Experiments [J] . Kaibel Chris, Biemann Torsten Organizational Research Methods . 2021,第1期

机译：用多武装燃烧的金标：实验的机器学习分配算法
2. Foraging decisions as multi-armed bandit problems: Applying reinforcement learning algorithms to foraging data [J] . Morimoto Juliano Journal of Theoretical Biology . 2019,第期

机译：觅食决策作为多武装强盗问题：应用强化学习算法觅食数据
3. Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems [J] . Koulouriotis DE, Xanthopoulos A Applied mathematics and computation . 2008,第2期

机译：非平稳多臂土匪问题的强化学习和进化算法
4. Bio-inspired meta-learning for active exploration during non-stationary multi-armed bandit tasks [C] . George Velentzas, Costas Tzafestas, Mehdi Khamassi 2017 Intelligent Systems Conference . 2017

机译：生物启发式元学习，可在非平稳多武装匪徒任务中进行主动探索
5. Using student mood and task performance to train classifier algorithms to select effective coaching strategies within intelligent tutoring systems (ITS). [D] . Sottilare, Robert A. 2009

机译：利用学生的情绪和任务表现来训练分类算法，以在智能补习系统（ITS）中选择有效的教练策略。
6. Within-Subject Performance on a Real-Life Complex Task and Traditional Lab Experiments: Measures of Word Learning Raven Matrices Tapping and CPR [O] . Florian Sense, Sarah Maaß, Kevin Gluck, 2019

机译：在现实生活中复杂任务和传统实验室实验中的主题内表现：词汇学习Raven矩阵点击和CPR的度量
7. Aggregation of multi-armed bandits learning algorithms for opportunistic spectrum access [O] . Lilian Besson, Emilie Kaufmann, Christophe Moy 2018

机译：用于机会主义频谱访问的多武装匪徒学习算法的聚合

Performance in Multi-Armed Bandit Tasks in Relation to Ambiguity-Preference Within a Learning Algorithm

摘要

著录项

相似文献

相关主题

期刊订阅