首页> 外国专利> Online temporal difference learning from incomplete customer interaction histories

Online temporal difference learning from incomplete customer interaction histories

机译:从不完整的客户互动历史中在线时差学习

摘要

In one embodiment, an indication that a decision has been requested, selected, or applied with respect to one or more users may be obtained. After the indication that a decision that has been requested, selected, or applied is obtained, a value function may be updated, where the value function approximates an expected reward associated with the one or more users over time since the decision has been requested, selected, or applied with respect to the one or more users. The value function may be updated by performing or providing one or more updates to the value function, where a time at which each of the one or more updates is performed or provided is independent of activity of the one or more users.
机译:在一个实施例中,可以获得关于一个或多个用户已经请求,选择或应用了决定的指示。在获得已经被请求,选择或应用的决定的指示之后,可以更新价值函数,其中该价值函数近似于自从请求,选择该决定以来随着时间的推移与一个或多个用户相关联的预期奖励。 ,或针对一个或多个用户应用。可以通过执行或提供对价值函数的一个或多个更新来更新价值函数,其中,执行或提供一个或多个更新中的每一个的时间与一个或多个用户的活动无关。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号