...
首页> 外文期刊>PLoS Computational Biology >Theory of Choice in Bandit, Information Sampling and Foraging Tasks
【24h】

Theory of Choice in Bandit, Information Sampling and Foraging Tasks

机译:强盗,信息采样和觅食任务中的选择理论

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Decision making has been studied with a wide array of tasks. Here we examine the theoretical structure of bandit, information sampling and foraging tasks. These tasks move beyond tasks where the choice in the current trial does not affect future expected rewards. We have modeled these tasks using Markov decision processes (MDPs). MDPs provide a general framework for modeling tasks in which decisions affect the information on which future choices will be made. Under the assumption that agents are maximizing expected rewards, MDPs provide normative solutions. We find that all three classes of tasks pose choices among actions which trade-off immediate and future expected rewards. The tasks drive these trade-offs in unique ways, however. For bandit and information sampling tasks, increasing uncertainty or the time horizon shifts value to actions that pay-off in the future. Correspondingly, decreasing uncertainty increases the relative value of actions that pay-off immediately. For foraging tasks the time-horizon plays the dominant role, as choices do not affect future uncertainty in these tasks.
机译:决策制定已经研究了很多任务。在这里,我们研究了土匪,信息采样和觅食任务的理论结构。这些任务超出了当前试验中的选择不会影响未来预期收益的任务。我们已经使用马尔可夫决策过程(MDP)对这些任务进行了建模。 MDP为建模任务提供了一个通用框架,在该模型中,决策会影响将要做出的未来选择的信息。在代理人最大化预期回报的假设下,MDP提供了规范解决方案。我们发现,所有这三类任务都在权衡即时和未来预期奖励的行动之间做出选择。但是,这些任务以独特的方式推动了这些折衷。对于强盗和信息采样任务,不确定性增加或时间跨度将价值转移到将来会获得回报的行动上。相应地,减少不确定性会增加立即获得回报的行动的相对价值。对于觅食任务,时间选择起主导作用,因为选择不会影响这些任务的未来不确定性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号