Linear Scalarized Knowledge Gradient in the Multi-Objective Multi-Armed Bandits Problem

机译：多目标多臂匪问题中的线性标量知识梯度

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single reward and these multiple rewards might be conflicting. The agent has a set of optimal arms and the agent's goal is not only finding the optimal arms, but also playing them fairly. To find the optimal arm set, the agent uses a linear scalarized (LS) function which converts the multi-objective arms into one-objective arms. LS function is simple, however it can not find all the optimal arm set. As a result, we extend knowledge gradient (KG) policy to LS function. We propose two variants of linear scalarized-KG, LS-KG across arms and dimensions. We experimentally compare the two variant, LS-KG across arms finds the optimal arm set, while LS-KG across dimensions plays fairly the optimal arms.

机译：多目标，多武装土匪（MOMAB）问题是具有随机奖励的马尔可夫决策过程。每个手臂生成一个奖励矢量，而不是单个奖励，而这些多个奖励可能会发生冲突。代理具有一组最佳臂，并且代理的目标不仅是找到最佳臂，而且要公平地使用它们。为了找到最佳手臂组，该代理使用线性标量（LS）函数将多目标手臂转换为一个目标手臂。 LS功能很简单，但是它找不到所有最佳的手臂组。结果，我们将知识梯度（KG）策略扩展到LS函数。我们提出了臂和尺寸上的线性标量KG，LS-KG的两个变体。我们通过实验比较这两个变体，跨武器的LS-KG找到了最佳的武器组合，而跨维度的LS-KG则扮演了最佳的武器。

著录项

来源
《European symposium on artificial neural networks, computational intelligence and machine learning》|2014年|147-152|共6页
会议地点
作者
Saba Yahyaa; Madalina M. Drugan; Bernard Manderick;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. ON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITS [J] . Edwards James, Fearnhead Paul, Glazebrook Kevin Probability in the Engineering and Informational Sciences . 2017,第2期

机译：多武装匪徒知识梯度策略中的弱点识别与缓解
2. Multi-objective multi-armed bandit with lexicographically ordered and satisficing objectives [J] . Huyuk Alihan, Tekin Cem Machine Learning . 2021,第6期

机译：具有词典排序和满足目标的多目标多武装匪
3. Hyper-heuristics using multi-armed bandit models for multi-objective optimization [J] . Almeida Carolina P., Goncalves Richard A., Venske Sandra, Applied Soft Computing . 2020,第1期

机译：利用多武装强盗模型进行多目标优化的超高兴
4. Linear Scalarized Knowledge Gradient in the Multi-Objective Multi-Armed Bandits Problem [C] . Saba Yahyaa, Madalina M. Drugan, Bernard Manderick European symposium on artificial neural networks, computational intelligence and machine learning . 2014

机译：在多目标多武装匪徒问题中线性标定知识梯度
5. Offline Evaluation of Multi-Armed Bandit Algorithms Using Bootstrapped Replay on Expanded Data [D] . Dai, Jin. 2021

机译：在扩展数据上使用引导重播的多武装强盗算法的离线评估
6. Smoking and the bandit: A preliminary study of smoker and non-smoker differences in exploratory behavior measured with a multi-armed bandit task [O] . Merideth A. Addicott, John M. Pearson, Jessica Wilson, -1

机译：吸烟和强盗：用多武装强盗任务测量的探索性行为的吸烟者和非吸烟者差异的初步研究
7. On the identification and mitigation of weaknesses in the Knowledge Gradient policy for multi-armed bandits [O] . Edwards James, Fearnhead Paul, Glazebrook Kevin David 2017

机译：识别和缓解多臂匪徒知识梯度政策中的弱点

Linear Scalarized Knowledge Gradient in the Multi-Objective Multi-Armed Bandits Problem

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅