首页> 外文会议>Annual Allerton Conference on Communication, Control, and Computing >Control of unknown linear systems with Thompson sampling
【24h】

Control of unknown linear systems with Thompson sampling

机译:用汤普森采样控制未知线性系统

获取原文

摘要

We propose a Thompson sampling based learning algorithm for the Linear Quadratic (LQ) control problem with unknown system parameters. The algorithm is called Thompson sampling with dynamic episodes (TSDE) where two stopping criteria determine the lengths of the dynamic episodes in Thompson sampling. The first stopping criterion controls the growth rate of episode length. The second stopping criterion is triggered when the determinant of the sample covariance matrix is less than half of the previous value. We show under some conditions on the prior distribution that the expected (Bayesian) regret of TSDE accumulated up to time T is bounded by Õ(√T). Here Õ (·) hides constants and logarithmic factors. This is the first Õ (√T) bound on expected regret of learning in LQ control. Numerical simulations are provided to illustrate the performance of TSDE.
机译:我们针对系统参数未知的线性二次(LQ)控制问题提出了一种基于汤普森采样的学习算法。该算法称为“带动态情节的汤普森采样”(TSDE),其中两个停止标准确定了“汤普森”采样中的动态情节的长度。第一个停止标准控制情节长度的增长率。当样本协方差矩阵的行列式小于先前值的一半时,触发第二个停止标准。我们在先验分布的某些条件下表明,在时间T之前累积的TSDE的预期(贝叶斯)后悔受Õ(√T)限制。 Õ(·)隐藏常数和对数因子。这是在LQ控制中预期学习后悔的第一个(√T)界。提供数值模拟以说明TSDE的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号