首页> 美国卫生研究院文献>PLoS Computational Biology >Theory of Choice in Bandit Information Sampling and Foraging Tasks
【2h】

Theory of Choice in Bandit Information Sampling and Foraging Tasks

机译:强盗信息采样和觅食任务中的选择理论

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Decision making has been studied with a wide array of tasks. Here we examine the theoretical structure of bandit, information sampling and foraging tasks. These tasks move beyond tasks where the choice in the current trial does not affect future expected rewards. We have modeled these tasks using Markov decision processes (MDPs). MDPs provide a general framework for modeling tasks in which decisions affect the information on which future choices will be made. Under the assumption that agents are maximizing expected rewards, MDPs provide normative solutions. We find that all three classes of tasks pose choices among actions which trade-off immediate and future expected rewards. The tasks drive these trade-offs in unique ways, however. For bandit and information sampling tasks, increasing uncertainty or the time horizon shifts value to actions that pay-off in the future. Correspondingly, decreasing uncertainty increases the relative value of actions that pay-off immediately. For foraging tasks the time-horizon plays the dominant role, as choices do not affect future uncertainty in these tasks.
机译:决策制定已经研究了很多任务。在这里,我们研究了土匪,信息采样和觅食任务的理论结构。这些任务超出了当前试验中的选择不会影响未来预期奖励的任务。我们已经使用马尔可夫决策过程(MDP)对这些任务进行了建模。 MDP为建模任务提供了一个通用框架,在该框架中,决策会影响信息,以便将来做出选择。在代理人最大化预期回报的假设下,MDP提供了规范解决方案。我们发现,所有这三类任务都会在权衡即时和未来预期奖励的行动中做出选择。但是,这些任务以独特的方式推动了这些折衷。对于强盗和信息采样任务,不确定性增加或时间跨度将价值转移到将来会获得回报的行动上。相应地,减少不确定性会增加立即获得回报的行动的相对价值。对于觅食任务,时间地平线起着主导作用,因为选择不会影响这些任务的未来不确定性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号