首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Credit Assignment Techniques in Stochastic Computation Graphs
【24h】

Credit Assignment Techniques in Stochastic Computation Graphs

机译:随机计算图中的信用分配技术

获取原文
           

摘要

Stochastic computation graphs (SCGs) provide a formalism to represent structured optimization problems arising in artificial intelligence, including supervised, unsupervised, and reinforcement learning. Previous work has shown that an unbiased estimator of the gradient of the expected loss of SCGs can be derived from a single principle. However, this estimator often has high variance and requires a full model evaluation per data point, making this algorithm costly in large graphs. In this work, we address these problems by generalizing concepts from the reinforcement learning literature. We introduce the concepts of value functions, baselines and critics for arbitrary SCGs, and show how to use them to derive lower-variance gradient estimates from partial model evaluations, paving the way towards general and efficient credit assignment for gradient-based optimization. In doing so, we demonstrate how our results unify recent advances in the probabilistic inference and reinforcement learning literature.
机译:随机计算图(SCG)提供了一种形式主义,以代表人工智能中出现的结构化优化问题,包括监督,无监督和加强学习。以前的工作表明,可以从单个原理中得出预期损耗的梯度的非偏见估计。然而,该估计器通常具有高方差,并且每个数据点需要完整的模型评估,使得该算法在大图中昂贵。在这项工作中,我们通过概括加强学习文献的概念来解决这些问题。我们介绍了任意SCG的价值函数,基线和批评的概念,并展示了如何使用它们来从部分模型评估中导出较低方差梯度估计,为基于梯度的优化铺平了一般和有效的信用分配。在这样做时,我们展示了我们的结果如何统一概率推理和加强学习文学的最新进步。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号