首页> 美国政府科技报告 >Approximating Action-Value Functions: Addressing Issues of Dynamic Range

【24h】

Approximating Action-Value Functions: Addressing Issues of Dynamic Range

机译：近似动作 - 值函数：解决动态范围问题

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Function approximation is necessary when applying RL to either Markov decision processes (MDPs) or semi-Markov decision processes (SMDPs) with very large state spaces. An often overlooked issue in approximating Q-functions in either framework arises when an action value update in a given state causes a large policy change in other states. Another way of stating this is to say that a small change in the Q-function results in a large change in the implied greedy policy. We call this sensitivity to changes in the Q-function the dynamic range problem and suggest that it may result in greatly increasing the number of training updates required to accurately approximate the optimal policy. We demonstrate that Advantage Learning solves the dynamic range problem in both frameworks and is more robust than some other RL algorithms on these problems. For an MDP, the Advantage Learning algorithm addresses this issue by re-scaling the dynamic range of action values within each state by a constant. For SMDPs the scaling constant can vary for each action.

著录项

作者
Harmon, M. E.;
展开▼
作者单位

展开▼
年度 1998
页码 1-34
总页数 34
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Artificial intelligence; Markov processes; Dynamic range; Algorithms; Computerized simulation; Decision making; Learning machines;

机译：人工智能;马尔可夫过程;动态范围;算法;计算机模拟;决策;学习机;

相似文献

外文文献
中文文献
专利

1. Alabama Power targeted as DOJ ranges to address NSR issues [J] . U.S. Coal Review . 2004,第1478期

机译：司法部将阿拉巴马州电力作为解决NSR问题的目标
2. Kinetic energy functionals: Exact ones from analytic model wave functions and approximate ones in orbital-free molecular dynamics [J] . Karasiev V.V., Lopez X., Ugalde J.M., International Journal of Modern Physics, B. Condensed Matter Physics, Statistical Physics, Applied Physics . 2010,第25a26aPta2期

机译：动能函数：解析模型波函数中的精确函数和无轨道分子动力学中的近似函数
3. Dynamics of soil structure and pore functions of a volcanic ash soil under tillage. (Special Issue: Development of soil structure and functions: How can mechanical and hydraulic approaches contribute to quantify soil structure dynamics?) [J] . Dorner J., Dec D., Feest E., Soil & Tillage Research . 2012,第Null期

机译：耕作下火山灰土壤的土壤结构和孔隙功能动力学。（特刊：土壤结构和功能的发展：机械和水力方法如何有助于量化土壤结构动力学？）
4. The Examination of the Classroom Accommodations to Address Executive Functioning Issues for a Child with Autism Spectrum Disorder in an Inclusive Classroom Setting: A Case Study [C] . Nurulhana Zainalabidin, Nor Aniza Ahmad International Conference on Special Education . 2019

机译：审查课堂住宿，以解决一个包容性教室环境中自闭症谱系障碍儿童的高管运作问题：案例研究
5. New computational approaches to quantum dynamics using "distributed approximating functionals". [D] . Iyengar, Srinivasan Sesha. 1998

机译：使用“分布式近似函数”的量子动力学新计算方法。
6. Complex consultations in primary care: a tool for assessing the range of health problems and issues addressed in general practice consultations [O] . Sunita Procter, Kate Stewart, David Reeves, 2014

机译：初级保健中的复杂咨询：一种评估健康问题和全科医生咨询中解决的问题的工具
7. : "This paper deals with the functionality of the flaked stone assemblages from several Neolithic sites of the NE of the Iberian Peninsula. The chronological framework ranges from the mid-6th to the mid-4th millennium cal. BCE, which corresponds to the Early and Middle Neolithic. This long period ranges from the arrival of the first farming communities in the Iberian Peninsula, around 5600-5500 cal. BCE, to 3600-3500 cal. BCE, two millennia later, when Neolithic communities had stably occupied most of the available territory, from the mountainous areas of the Pyrenees to the coastal areas of the Mediterranean.In this context, the NE of the Iberian Peninsula is one of the areas where a major number of use-wear analyses have been carried out, especially for what concerns the Neolithic period. This is mainly due to the interest shown by the directors of the excavations and by the Spanish use-wear specialists as well; both participated and cooperated in order to enlarge the number of contexts studied, making the NE of the Iberian Peninsula one of the areas of Europe with the largest number of ‘functional studies’. As a result, today we have a reliable picture of the type of lithic tools used by the first farming groups, how they were used, and which needs they were satisfying.The methodology employed for this type of analysis is today broadly shared by most use-wear specialists. A stereoscopic microscope is used together with a reflected-light microscope for the analysis of the archaeological specimens. Afterwards, the observed use-wear traces are compared with the traces from experimental tools. This study primarily made use of the tools preserved at the ‘Traceoteque’ of the Institución Milà y Fontanals of the CSIC of Barcelona.This paper presents the results obtained for the different types of analysed sites and the relative lithic assemblages. In this way, tools from different contexts are compared: open-air sites, caves and rock-shelters, pit-sites, burials, and mining sites. Such a diversity of contexts allowed for the exploration of the existence of recurrences and differences in the functionality of the lithic tools from one site to another. The results obtained have also provided information on some aspects related to the economic processes carried out at the different sites, caves, rock-shelters and open-air contexts. It has been possible to gain fresh data on the types of tasks carried out and their relative importance for each site, their relation with the geographical and environmental context and the natural resources available there. In addition, analysing tools recovered from burials – mainly individual inhumations – provided information on the relationships between the buried individuals and the working tools, in particular exploring the relationships between sex and age patterns and the types of tools deposited as goods.In conclusion, use-wear analysis allowed for the exploration of a variety of issues; from technical aspects related to the production and management of the lithic resources to social aspects related to the subsistence activities and the individuals that carried them out." [O] . Gibaja Bao, Juan Francisco, Mazzucco, Niccolo 2015

机译：：“本文研究了伊比利亚半岛东北部几个新石器时代遗址的片状石材组合的功能。时间框架从公元前6世纪中叶到4世纪中叶，分别对应于早期和早期。新石器时代中期，这段时间的长短是从伊比利亚半岛的第一个农业社区到来，大约在公元前5600-5500 cal，到了两千年后的3600-3500 BCE，当时新石器时代的社区稳定地占据了大部分可用土地。从比利牛斯山脉山区到地中海沿岸地区。在这种情况下，伊比利亚半岛东北部是进行过大量使用磨损分析的地区之一，尤其是在关注方面新石器时代，这主要是由于发掘负责人和西班牙使用服装专家表现出的浓厚兴趣；他们参与和合作以扩大孔蒂的数量xts进行了研究，使伊比利亚半岛的NE成为“功能研究”数量最多的欧洲地区之一。结果，今天我们可以可靠地了解第一批耕作小组使用的石器工具的类型，使用方法以及它们满足的需求。今天，大多数使用方法广泛共享用于这种类型的分析的方法穿专家。立体显微镜与反射光显微镜一起用于分析考古样本。然后，将观察到的使用磨损痕迹与实验工具的痕迹进行比较。这项研究主要利用了巴塞罗那CSIC的米拉·丰塔纳尔研究所的“ Traceoteque”中保存的工具。本文介绍了针对不同类型的分析地点和相对石器组合所获得的结果。通过这种方式，可以比较来自不同环境的工具：露天场所，洞穴和岩石避难所，露天场所，埋葬场所和采矿场所。如此多样的上下文允许探索重复性的存在以及从一个站点到另一个站点的石器工具的功能差异。所获得的结果还提供了与在不同地点，洞穴，避难所和露天环境中进行的经济过程有关的某些方面的信息。可以获得有关所执行任务的类型及其对每个站点的相对重要性，它们与地理和环境环境以及那里可用自然资源的关系的新数据。此外，对从墓葬中回收的工具（主要是个人遗体）进行分析，可提供有关被掩埋的个体与工作工具之间关系的信息，尤其是探索性别与年龄模式之间的关系以及作为商品存放的工具的类型。磨损分析允许探索各种问题；从与石器资源的生产和管理有关的技术方面到与自给活动及其实施者有关的社会方面。”

Approximating Action-Value Functions: Addressing Issues of Dynamic Range

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅