首页>
外国专利>
FORECASTING AND LEARNING ACCURATE AND EFFICIENT TARGET POLICY PARAMETERS FOR DYNAMIC PROCESSES IN NON-STATIONARY ENVIRONMENTS
FORECASTING AND LEARNING ACCURATE AND EFFICIENT TARGET POLICY PARAMETERS FOR DYNAMIC PROCESSES IN NON-STATIONARY ENVIRONMENTS
展开▼
机译:预测和学习非平稳环境中动态过程的准确有效的目标策略参数
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present disclosure relates to systems, methods, and non-transitory computer-readable media that determine target policy parameters that enable target policies to provide improved future performance, even in circumstances where the underlying environment is non-stationary. For example, in one or more embodiments, the disclosed systems utilize counter-factual reasoning to estimate what the performance of the target policy would have been if implemented during past episodes of action-selection. Based on the estimates, the disclosed systems forecast a performance of the target policy for one or more future decision episodes. In some implementations, the disclosed systems further determine a performance gradient for the forecasted performance with respect to varying a target policy parameter for the target policy. In some cases, the disclosed systems use the performance gradient to efficiently modify the target policy parameter, without undergoing the computational expense of expressly modeling variations in underlying environmental functions.
展开▼