【24h】

Better algorithms for benign bandits

机译:良性土匪的更好算法

获取原文

摘要

The online multi-armed bandit problem and its generalizations are repeated decision making problems, where the goal is to select one of several possible decisions in every round, and incur a cost associated with the decision, in such a way that the total cost incurred over all iterations is close to the cost of the best fixed decision in hindsight. The difference in these costs is known as the regret of the algorithm. The term bandit refers to the setting where one only obtains the cost of the decision used in a given iteration and no other information.
机译:在线多武装匪徒问题及其概括是重复的决策问题,其目的是在每一轮中选择几个可能的决策中的一个,并招致与决策相关的成本,以使总成本超过在事后看来,所有迭代都接近最佳固定决策的成本。这些成本之间的差异被称为算法的遗憾。术语“匪徒”是指这样一种设置,其中仅获得给定迭代中使用的决策成本,而没有其他信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号