首页> 外文会议>European symposium on artificial neural networks, computational intelligence and machine learning >Linear Scalarized Knowledge Gradient in the Multi-Objective Multi-Armed Bandits Problem
【24h】

Linear Scalarized Knowledge Gradient in the Multi-Objective Multi-Armed Bandits Problem

机译:多目标多臂匪问题中的线性标量知识梯度

获取原文
获取外文期刊封面目录资料

摘要

The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single reward and these multiple rewards might be conflicting. The agent has a set of optimal arms and the agent's goal is not only finding the optimal arms, but also playing them fairly. To find the optimal arm set, the agent uses a linear scalarized (LS) function which converts the multi-objective arms into one-objective arms. LS function is simple, however it can not find all the optimal arm set. As a result, we extend knowledge gradient (KG) policy to LS function. We propose two variants of linear scalarized-KG, LS-KG across arms and dimensions. We experimentally compare the two variant, LS-KG across arms finds the optimal arm set, while LS-KG across dimensions plays fairly the optimal arms.
机译:多目标,多武装土匪(MOMAB)问题是具有随机奖励的马尔可夫决策过程。每个手臂生成一个奖励矢量,而不是单个奖励,而这些多个奖励可能会发生冲突。代理具有一组最佳臂,并且代理的目标不仅是找到最佳臂,而且要公平地使用它们。为了找到最佳手臂组,该代理使用线性标量(LS)函数将多目标手臂转换为一个目标手臂。 LS功能很简单,但是它找不到所有最佳的手臂组。结果,我们将知识梯度(KG)策略扩展到LS函数。我们提出了臂和尺寸上的线性标量KG,LS-KG的两个变体。我们通过实验比较这两个变体,跨武器的LS-KG找到了最佳的武器组合,而跨维度的LS-KG则扮演了最佳的武器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号