首页> 外国专利> NON-STATIONARY DELAYED BANDITS WITH INTERMEDIATE SIGNALS

NON-STATIONARY DELAYED BANDITS WITH INTERMEDIATE SIGNALS

机译:具有中间信号的非静止延迟匪

摘要

Methods, systems, and apparatus, including computer programs encoded on computer storage media, of selecting actions from a set of actions to be performed in an environment. One of the methods includes, at each time step: maintaining count data; determining, for each action, a respective current transition probability distribution that includes a respective current transition probability for each of the intermediate signals that represents an estimate of a current likelihood that the intermediate signal will be observed if the action is performed; determining, for each intermediate signal, a respective reward estimate that is an estimate of a reward that will be received as a result of the intermediate signal being observed; determining, from the respective current transition probability distributions and the respective reward estimates, a respective action score for each action; and selecting an action to be performed based on the respective action scores.
机译:方法,系统和设备,包括在计算机存储介质上编码的计算机程序,从而从环境中执行的一组动作选择动作。其中一种方法包括在每个时间步骤:维护计数数据;对于每个动作,确定每个动作的相应电流转换概率分布,其包括表示所在中间信号中的每一个的相应电流转换概率,所述中间信号表示如果执行动作,则应观察中间信号的电流似然性的估计值;确定每个中间信号的相应奖励估计,其是由于观察到的中间信号而被接收的奖励的估计;从相应的电流转换概率分布和各个奖励估计中确定,每个动作的相应动作分数;并选择基于相应的动作分数执行的动作。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号