首页> 外文期刊>Proceedings of the IEEE >A Decision Task in a Social Context: Human Experiments, Models, and Analyses of Behavioral Data
【24h】

A Decision Task in a Social Context: Human Experiments, Models, and Analyses of Behavioral Data

机译:社会环境中的决策任务:人类实验,行为数据模型和分析

获取原文
获取原文并翻译 | 示例
       

摘要

To investigate the influence of information about fellow group members in a constrained decision-making context, we develop four two-armed bandit tasks in which subjects freely select one of two options ( $A$ or $B$) and are informed of the resulting reward following each choice. Rewards are determined by the fraction $x$ of past $A$ choices by two functions $f_{A}(x),f_{B}(x)$ (unknown to the subject) which intersect at a matching point $bar{x}$ that does not generally represent globally optimal behavior. Playing individually, subjects typically remain close to the matching point, although some discover the optimum. Each task is designed to probe a different type of behavior, and subjects work in parallel in groups of five with feedback of other group members' choices, of their rewards, of both, or with no knowledge of others' behavior. We employ a soft-max choice model that emerges from a drift-diffusion process, commonly used to model perceptual decision making with noisy stimuli. Here the stimuli are replaced by estimates of expected rewards produced by a temporal-difference reinforcement-learning algorithm, augmented to include appropriate feedback terms. Models are fitted for each task and feedback condition, and we compare choice allocations averaged across subjects and individual choice sequences to highlight differences between tasks and intersubject differences. The most complex model, involving both choice and reward feedback, contains only four parameters, but nonetheless reveals significant differences in individual strategies. Strikingly, we find that rewards-nfeedback can be either detrimental or advantageous to performance, depending upon the task.
机译:为了在受限的决策环境中调查有关小组成员的信息的影响,我们开发了四个两臂匪徒任务,其中受试者自由选择两个选项($ A $或$ B $)之一,并获悉结果奖励每个选择。奖励由两个函数$ f_ {A}(x),f_ {B}(x)$(对于对象而言未知)在匹配点$ bar { x} $通常不代表全局最优行为。尽管有些人发现最佳状态,但单独播放时,它们通常会保持在匹配点附近。每个任务旨在探究不同类型的行为,并且受试者以五人一组的形式并行工作,并获得其他小组成员的选择,他们的奖励,两者的反馈,或者不知道其他人的行为。我们采用从漂移扩散过程中产生的soft-max选择模型,该模型通常用于建模带有噪声刺激的感知决策。在这里,刺激被由时差增强学习算法产生的预期奖励的估计所取代,该算法被增强为包括适当的反馈项。模型适合于每个任务和反馈条件,并且我们比较受试者和个体选择序列之间的平均选择分配,以突出显示任务之间的差异和受试者之间的差异。涉及选择和奖励反馈的最复杂模型仅包含四个参数,但仍显示出各个策略的显着差异。令人惊讶的是,根据任务的不同,我们发现奖励-n反馈对性能可能有害或有利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号