首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Gain estimation of linear dynamical systems using Thompson Sampling
【24h】

Gain estimation of linear dynamical systems using Thompson Sampling

机译:使用汤普森采样对线性动力系统进行增益估计

获取原文
           

摘要

We present the gain estimation problem for linear dynamical systems as a multi-armed bandit. This is particularly a very important engineering problem in control design, where performance guarantees are casted in terms of the largest gain of the frequency response of the system. The dynamical system is unknown and only noisy input-output data is available. In a more general setup, the noise perturbing the data is non-white and the variance at each frequency band is unknown, resulting in a two-dimensional Gaussian bandit model with unknown mean and scaled-identity covariance matrix. This model corresponds to a two-parameter exponential family. Within a bandit framework, the set of means is given by the frequency response of the system and, unlike traditional bandit problems, the goal here is to maximize the probability of choosing the arm drawing samples with the highest norm of its mean. A problem-dependent lower bound for the expected cumulative regret is derived and a matching upper bound is obtained for a Thompson-Sampling algorithm under a uniform prior over the variances and the two-dimensional means.
机译:我们将线性动力学系统的增益估计问题作为多臂匪徒提出。这是控制设计中非常重要的工程问题,在该设计中,性能保证取决于系统频率响应的最大增益。动态系统是未知的,只有嘈杂的输入输出数据可用。在更通用的设置中,扰动数据的噪声不是白色的,每个频带的方差都是未知的,从而导致二维高斯匪徒模型具有未知的均值和比例一致性协方差矩阵。该模型对应于一个两参数指数族。在强盗框架内,均值是由系统的频率响应给出的,与传统的强盗问题不同,此处的目标是最大程度地选择均值最高范数的手臂画样本。对于期望的累积遗憾,导出了一个与问题相关的下限,并针对Thompson-Sampling算法在一致的先验方差和二维均值下获得了匹配的上限。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号