【24h】

Online Learning with Constraints

机译:有约束的在线学习

获取原文
获取原文并翻译 | 示例

摘要

We study online learning where the objective of the decision maker is to maximize her average long-term reward given that some average constraints are satisfied along the sample path. We define the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature's choices in advance. We show that in general the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint the convex hull turns out to be the highest attainable function. We further provide an explicit strategy that attains this convex hull using a calibrated forecasting rule.
机译:我们研究在线学习,决策者的目标是最大化她的平均长期奖励,因为在样本路径上满足一些平均约束。我们将见解奖励定义为决策者在满足约束条件的前提下,如果她事先知道Nature的选择,则可以获得的最高奖励。我们表明,总的来说,事后发现是无法实现的。但是,可以实现事后奖励功能的凸包。对于单个约束的重要情况,凸包被证明是可获得的最高功能。我们进一步提供了使用校准的预测规则来达到该凸包的显式策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号