首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Bandits with Movement Costs and Adaptive Pricing
【24h】

Bandits with Movement Costs and Adaptive Pricing

机译:具有运动成本和自适应定价的土匪

获取原文
       

摘要

We extend the model of Multi-Armed Bandit with unit switching cost to incorporate a metric between the actions. We consider the case where the metric over the actions can be modeled by a complete binary tree, and the distance between two leaves is the size of the subtree of their least common ancestor, which abstracts the case that the actions are points on the continuous interval $[0,1]$ and the switching cost is their distance. In this setting, we give a new algorithm that establishes a regret of $widetilde{O}(sqrt{k}T + T/k)$, where $k$ is the number of actions and $T$ is the time horizon. When the set of actions corresponds to whole $[0,1]$ interval we can exploit our method for the task of bandit learning with Lipschitz loss functions, where our algorithm achieves an optimal regret rate of $widetilde{Θ}(T^2/3)$, which is the same rate one obtains when there is no penalty for movements. As our main application, we use our new algorithm to solve an adaptive pricing problem. Specifically, we consider the case of a single seller faced with a stream of patient buyers. Each buyer has a private value and a window of time in which they are interested in buying, and they buy at the lowest price in the window, if it is below their value. We show that with an appropriate discretization of the prices, the seller can achieve a regret of $widetilde{O}(T^2/3)$ compared to the best fixed price in hindsight, which outperform the previous regret bound of $widetilde{O}(T^3/4)$ for the problem.
机译:我们用单位转换成本扩展了多武装强盗的模型,以在行动之间加入一个指标。我们考虑的情况下,可以通过完整的二叉树来模拟操作的度量,并且两片叶子之间的距离是其最不常见祖先的子树的大小,从而抽象出了操作是连续区间上的点的情况$ [0,1] $,转换成本就是它们的距离。在这种情况下,我们给出了一种新算法,该算法建立了$ widetilde {O}( sqrt {k} T + T / k)$后悔的算法,其中$ k $是动作数,$ T $是时间地平线。当一组动作对应于整个$ [0,1] $间隔时,我们可以利用Lipschitz损失函数利用我们的方法进行强盗学习任务,其中我们的算法获得$ widetilde {Θ}(T ^ 2/3)$,这与不对移动进行惩罚时获得的汇率相同。作为主要应用,我们使用新算法来解决自适应定价问题。具体来说,我们考虑的是单个卖家面对大量耐心买家的情况。每个买家都有一个私人价值和一个他们有兴趣购买的时间窗口,如果价格低于其价值,他们会以窗口中的最低价格购买。我们显示,通过适当地离散价格,与事后发现的最佳固定价格相比,卖方可以实现$ widetilde {O}(T ^ 2/3)$的后悔,后者优于之前的$ widetilde {O}(T ^ 3/4)$解决问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号