首页> 美国卫生研究院文献>Sensors (Basel Switzerland) >Intelligent Control of a Sensor-Actuator System via Kernelized Least-Squares Policy Iteration
【2h】

Intelligent Control of a Sensor-Actuator System via Kernelized Least-Squares Policy Iteration

机译:通过核化最小二乘策略迭代对传感器-执行器系统进行智能控制

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this paper a new framework, called Compressive Kernelized Reinforcement Learning (CKRL), for computing near-optimal policies in sequential decision making with uncertainty is proposed via incorporating the non-adaptive data-independent Random Projections and nonparametric Kernelized Least-squares Policy Iteration (KLSPI). Random Projections are a fast, non-adaptive dimensionality reduction framework in which high-dimensionality data is projected onto a random lower-dimension subspace via spherically random rotation and coordination sampling. KLSPI introduce kernel trick into the LSPI framework for Reinforcement Learning, often achieving faster convergence and providing automatic feature selection via various kernel sparsification approaches. In this approach, policies are computed in a low-dimensional subspace generated by projecting the high-dimensional features onto a set of random basis. We first show how Random Projections constitute an efficient sparsification technique and how our method often converges faster than regular LSPI, while at lower computational costs. Theoretical foundation underlying this approach is a fast approximation of Singular Value Decomposition (SVD). Finally, simulation results are exhibited on benchmark MDP domains, which confirm gains both in computation time and in performance in large feature spaces.
机译:在本文中,通过结合非自适应数据无关的随机投影和非参数核化最小二乘策略迭代,提出了一个新的框架,称为压缩核强化学习(CKRL),用于计算具有不确定性的顺序决策中的近最优策略。 KLSPI)。随机投影是一种快速的,非自适应的降维框架,其中,高维数据通过球形随机旋转和协调采样投影到随机的低维子空间上。 KLSPI将内核技巧引入了用于增强学习的LSPI框架,通常可实现更快的收敛,并通过各种内核稀疏化方法提供自动功能选择。在这种方法中,在通过将高维特征投影到一组随机基础上生成的低维子空间中计算策略。我们首先显示随机投影如何构成有效的稀疏化技术,以及我们的方法通常比常规LSPI收敛更快,同时计算成本更低。这种方法的理论基础是奇异值分解(SVD)的快速近似。最后,在基准MDP域上展示了仿真结果,这些结果证实了在较大特征空间中计算时间和性能方面的收益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号