...
首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Covariance Matrix Adaptation for Multiobjective Multiarmed Bandits
【24h】

Covariance Matrix Adaptation for Multiobjective Multiarmed Bandits

机译:多目标多臂匪的协方差矩阵自适应

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Upper confidence bound (UCH) is a successful multiarmed bandit for regret minimization. The covariance matrix adaptation (CMA) for Pareto UCB (CMA-PUCB) algorithm considers stochastic reward vectors with correlated objectives. We upper bound the cumulative pseudoregret of pulling suboptimal arms for the CMA-PUCB algorithm to logarithmic number of arms K, objectives D, and samples n, 0(In(n DK) Sigma(i) (parallel to Sigma(i)parallel to(2)/Delta(i))), using a variant of Berstein inequality for matrices, where Delta(i) is the regret of pulling the suboptimal arm i. For unknown covariance matrices between objectives Sigma(i), we upper bound the approximation of the covariance matrix using the number of samples to o(n ln(n DK) + ln(2)(nDK) Sigma(i) (1/Delta(i))) Simulations on a three objective stochastic environment show the applicability of our method.
机译:上置信界(UCH)是成功的多臂匪徒,可最大程度地减少后悔。帕累托UCB(CMA-PUCB)算法的协方差矩阵适应(CMA)考虑具有相关目标的随机奖励向量。我们将拉出CMA-PUCB算法的次优臂的累积伪后悔上限设为臂K,目标D和样本n,0(In(n DK)Sigma(i)(平行于Sigma(i)平行(2)/ Delta(i))),对矩阵使用Berstein不等式的变体,其中Delta(i)是拉出次优臂i的遗憾。对于目标Sigma(i)之间的未知协方差矩阵,我们使用样本数将o(n ln(n DK)+ ln(2)(nDK)Sigma(i)(1 / Delta)作为样本的协方差矩阵的近似上限(i)))在三个目标随机环境下的仿真表明了我们方法的适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号