Bandit Convex Optimization in Non-stationary Environments

Peng Zhao; Guanghui Wang; Lijun Zhang; Zhi-Hua Zhou

首页> 外文期刊>Journal of machine learning research >Bandit Convex Optimization in Non-stationary Environments

【24h】

Bandit Convex Optimization in Non-stationary Environments

机译：非静止环境中的强盗凸优化

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Bandit Convex Optimization (BCO) is a fundamental framework for modeling sequential decision-making with partial information, where the only feedback available to the player is the one-point or two-point function values. In this paper, we investigate BCO in non-stationary environments and choose the dynamic regret as the performance measure, which is defined as the difference between the cumulative loss incurred by the algorithm and that of any feasible comparator sequence. Let $T$ be the time horizon and $P_T$ be the path-length of the comparator sequence that reflects the non-stationarity of environments. We propose a novel algorithm that achieves $O(T^{3/4}(1+P_T)^{1/2})$ and $O(T^{1/2}(1+P_T)^{1/2})$ dynamic regret respectively for the one-point and two-point feedback models. The latter result is optimal, matching the $Omega(T^{1/2}(1+P_T)^{1/2})$ lower bound established in this paper. Notably, our algorithm is adaptive to the non-stationary environments since it does not require prior knowledge of the path-length $P_T$ ahead of time, which is generally unknown. We further extend the algorithm to an anytime version that does not require to know the time horizon $T$ in advance. Moreover, we study the adaptive regret, another widely used performance measure for online learning in non-stationary environments, and design an algorithm that provably enjoys the adaptive regret guarantees for BCO problems. Finally, we present empirical studies to validate the effectiveness of the proposed approach.

机译：BANDIT凸优化（BCO）是用于使用部分信息建模连续决策的基本框架，其中播放器可用的唯一反馈是单点或两点函数值。在本文中，我们在非静止环境中调查BCO，并选择动态遗憾作为性能测量，定义为算法产生的累积损失与任何可行比较器序列之间的差异。让$ T $是时间范围和$ P_T $是反映环境的非实用性的比较器序列的路径长度。我们提出了一种新颖的算法，实现$ O（t ^ {3/4}（1 + p_t）^ {1/2}）$和$ o（t ^ {1/2}（1 + p_t）^ {1 / 2}）分别为单点和两点反馈模型的动态遗憾。后一个结果是最佳的，匹配$ omega（t ^ {1/2}（1 + p_t）^ {1/2}）$下限于本文建立。值得注意的是，我们的算法对非静止环境自适应，因为它不需要提前的路径长度$ P_T $的先验知识，这通常是未知的。我们进一步将算法扩展到任何不需要提前了解时间范围$ $ $ $ $ $ $ $ $ $ t $ $ t的任何时间版本。此外，我们研究了适应性遗憾，另一个广泛使用的在线学习中的在线学习中的绩效措施，并设计了一种可证明的算法，可享受BCO问题的自适应遗憾担保。最后，我们展示了实证研究来验证提出的方法的有效性。

著录项

来源
《Journal of machine learning research》 |2021年第a期|共45页
作者
Peng Zhao; Guanghui Wang; Lijun Zhang; Zhi-Hua Zhou;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Stochastic online optimization. Single-point and multi-point non-linear multi-armed bandits. Convex and strongly-convex case [J] . Gasnikov A. V., Krymova E. A., Lagunovskaya A. A., Automation and Remote Control . 2017,第2期

机译：随机在线优化。单点和多点非线性多武装匪徒。凸和强凸案
2. Zeroth Order Non-convex optimization with Dueling-Choice Bandits [J] . Yichong Xu, Aparna Joshi, Aarti Singh, JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：Zeroth命令与Dueling-Choice Bartits的非凸优化
3. Bandit Convex Optimization for Scalable and Dynamic IoT Management [J] . Chen Tianyi, Giannakis Georgios B. Internet of Things Journal, IEEE . 2019,第1期

机译：可扩展和动态物联网管理的Bandit凸优化
4. An Optimal Algorithm for Adversarial Bandit Problem with Multiple Plays in Non-Stationary Environments [C] . N. Mert Vural, Bugra Ozturk, Suleyman S. Kozat Signal Processing and Communications Applications Conference . 2020

机译：非静止环境多播种的对抗性强盗问题的最优算法
5. Investigating the Non-Stationary Bandit Problem [D] . Zografos, Dimitri. 2020

机译：调查非稳定匪徒问题
6. Optimization of Mixed Energy Supply of IoT Network Based on Matching Game and Convex Optimization [O] . Dongsheng Han, Tao Liu, Yincheng Qi 2020

机译：基于匹配游戏和凸优化的IOT网络混合能源优化
7. Bandit Convex Optimization for Scalable and Dynamic IoT Management [O] . Chen, Tianyi, Giannakis, Georgios B. 2017

机译：用于可扩展和动态物联网管理的Bandit Convex优化
8. Passenger Vehicle, Light Truck and Van Convex Mirror Optimization and Evaluation Studies. Volume I: Convex Mirror Optimization [R] . Burger, W. J. , Mulholland, M. U. , Smith, R. L. , 1980

机译：乘用车，轻型卡车和Van Convex镜面优化和评估研究。第一卷：凸镜优化

Bandit Convex Optimization in Non-stationary Environments

摘要

著录项

相似文献

相关主题

期刊订阅