首页> 外国专利> FORECASTING AND LEARNING ACCURATE AND EFFICIENT TARGET POLICY PARAMETERS FOR DYNAMIC PROCESSES IN NON-STATIONARY ENVIRONMENTS

FORECASTING AND LEARNING ACCURATE AND EFFICIENT TARGET POLICY PARAMETERS FOR DYNAMIC PROCESSES IN NON-STATIONARY ENVIRONMENTS

机译：预测和学习非平稳环境中动态过程的准确有效的目标策略参数

页面导航

摘要
著录项
相似文献

摘要

The present disclosure relates to systems, methods, and non-transitory computer-readable media that determine target policy parameters that enable target policies to provide improved future performance, even in circumstances where the underlying environment is non-stationary. For example, in one or more embodiments, the disclosed systems utilize counter-factual reasoning to estimate what the performance of the target policy would have been if implemented during past episodes of action-selection. Based on the estimates, the disclosed systems forecast a performance of the target policy for one or more future decision episodes. In some implementations, the disclosed systems further determine a performance gradient for the forecasted performance with respect to varying a target policy parameter for the target policy. In some cases, the disclosed systems use the performance gradient to efficiently modify the target policy parameter, without undergoing the computational expense of expressly modeling variations in underlying environmental functions.

机译：本发明涉及确定目标策略参数的系统、方法和非暂时性计算机可读介质，这些参数使目标策略能够提供改进的未来性能，即使在底层环境是非平稳的情况下也是如此。例如，在一个或多个实施例中，所公开的系统利用反事实推理来估计如果在过去的动作选择事件期间实施，目标策略的性能会是什么。基于这些估计，所公开的系统预测一个或多个未来决策事件的目标策略的性能。在一些实现中，所公开的系统进一步确定关于改变目标策略的目标策略参数的预测性能的性能梯度。在一些情况下，所公开的系统使用性能梯度来有效地修改目标策略参数，而不需要经历明确地建模底层环境函数中的变化的计算开销。

著录项

公开/公告号US2022121968A1

专利类型
公开/公告日2022-04-21

原文格式PDF
申请/专利权人 ADOBE INC.;
展开▼

申请/专利号US202017072868
发明设计人 YASH CHANDAK;GEORGIOS THEOCHAROUS;SRIDHAR MAHADEVAN;
展开▼

申请日2020-10-16
分类号G06N5/04;G06Q10/06;G06Q10/10;
国家 US
入库时间 2022-08-25 00:35:18

相似文献

专利
外文文献
中文文献