Credit Assignment Techniques in Stochastic Computation Graphs

Théophane Weber; Nicolas Heess; Lars Buesing; David Silver

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Credit Assignment Techniques in Stochastic Computation Graphs

【24h】

Credit Assignment Techniques in Stochastic Computation Graphs

机译：随机计算图中的信用分配技术

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Stochastic computation graphs (SCGs) provide a formalism to represent structured optimization problems arising in artificial intelligence, including supervised, unsupervised, and reinforcement learning. Previous work has shown that an unbiased estimator of the gradient of the expected loss of SCGs can be derived from a single principle. However, this estimator often has high variance and requires a full model evaluation per data point, making this algorithm costly in large graphs. In this work, we address these problems by generalizing concepts from the reinforcement learning literature. We introduce the concepts of value functions, baselines and critics for arbitrary SCGs, and show how to use them to derive lower-variance gradient estimates from partial model evaluations, paving the way towards general and efficient credit assignment for gradient-based optimization. In doing so, we demonstrate how our results unify recent advances in the probabilistic inference and reinforcement learning literature.

机译：随机计算图（SCG）提供了一种形式主义，以代表人工智能中出现的结构化优化问题，包括监督，无监督和加强学习。以前的工作表明，可以从单个原理中得出预期损耗的梯度的非偏见估计。然而，该估计器通常具有高方差，并且每个数据点需要完整的模型评估，使得该算法在大图中昂贵。在这项工作中，我们通过概括加强学习文献的概念来解决这些问题。我们介绍了任意SCG的价值函数，基线和批评的概念，并展示了如何使用它们来从部分模型评估中导出较低方差梯度估计，为基于梯度的优化铺平了一般和有效的信用分配。在这样做时，我们展示了我们的结果如何统一概率推理和加强学习文学的最新进步。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第2010期|共11页
作者
Théophane Weber; Nicolas Heess; Lars Buesing; David Silver;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Activity-Based Credit Assignment Heuristic for Simulation-Based Stochastic Search in a Hierarchical Model Base of Systems [J] . Alexandre Muzy, Bernard P. Zeigler IEEE systems journal . 2017,第4期

机译：系统分层模型库中基于活动的信用分配启发式方法用于基于仿真的随机搜索
2. Multi-agent credit assignment in stochastic resource management games [J] . Mannion Patrick, Devlin Sam, Duggan Jim, The Knowledge Engineering Review . 2017,第期

机译：随机资源管理游戏中的多主体信用分配
3. THE NEURAL AND COMPUTATIONAL BASIS OF CREDIT ASSIGNMENT IN ENVIRONMENTS WITH MULTIPLE FEEDBACK [J] . Wurm Franz, Ernst Benjamin, Steinhauser Marco Psychophysiology . 2019,第S1期

机译：多次反馈环境中信用分配的神经和计算基础
4. A general iterative technique for approximate throughput computation of stochastic marked graphs [C] . Campos, J., Colom, Petri Nets and Performance Models, 1993. Proceedings., 5th International Workshop on . 1993

机译：用于随机标记图的近似吞吐量计算的通用迭代技术
5. Computational Techniques for Stochastic Reachability. [D] . Lesser, Kendra. 2014

机译：随机可及性的计算技术。
6. Graphene Dendrimer-stabilized silver nanoparticles for detection of methimazole using Surface-enhanced Raman scattering with computational assignment [O] . Tawfik A. Saleh, Mutasem M. Al-Shalalfeh, Abdulaziz A. Al-Saadi -1

机译：石墨烯树枝状聚合物稳定的银纳米颗粒用于通过表面分配拉曼散射和计算分配来检测甲巴唑
7. A General Iterative Technique for Approximate Throughput Computation of Stochastic Marked Graphs [O] . J. Campos, J.M. Colom, H. Jungnitz, 1993

机译：随机标记图近似吞吐量计算的一般迭代技术

Credit Assignment Techniques in Stochastic Computation Graphs

摘要

著录项

相似文献

相关主题

期刊订阅