【24h】

Loosely symmetric reasoning to cope with the speed-accuracy trade-off

机译:松散对称推理以应对速度-精度折衷

获取原文

摘要

When we learn from unknown environment to collect reward, we face speed-accuracy trade-off for the decision-making that agents act. We will lose if we continue to act greedily, but we cannot maximize reward if we search continually. From experience, it is assumed that human beings act with some kind of standards to cope with trade-off. Hence, we focused symmetric reasoning that is kind of Illogical cognitive properties peculiar to human beings, as a valid solution for speed-accuracy tradeoff. In this study, we simulated the N armed bandit problem as a simple decision-making problem, using Loosely Symmetric model (LS) which is a model of flexibly and loosely symmetric reasoning. In addition, with theoretical consideration for LS and the the change of the reference point as an idea, we developed LS with Variable Reference (LSVR) as a newly improved model, and simulated this model. As a result, In case it has many choices, we confirmed that LSVR can collect overwhelming reward than UCB1 that is the excellent decision-making model used Go AI.
机译:当我们从未知的环境中学习以获取奖励时,我们将面对速度准确性的权衡,以权衡代理所采取的决策。如果我们继续贪婪地行事,我们将蒙受损失,但如果我们不断搜寻,我们将无法获得最大的回报。根据经验,假定人类以某种标准行事以应对取舍。因此,我们将对称推理作为人类特有的不合逻辑的认知特性,作为速度准确性权衡的有效解决方案。在本研究中,我们使用松散对称模型(LS)将N武装匪徒问题模拟为一个简单的决策问题,该模型是灵活和松散对称推理的模型。另外,出于对LS的理论考虑和参考点变化的想法,我们开发了带有可变参考(LSVR)的LS作为新改进的模型,并对该模型进行了仿真。结果,在有很多选择的情况下,我们证实了LSVR可以比使用Go AI的出色决策模型UCB1收取压倒性的回报。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号