Bandits with Movement Costs and Adaptive Pricing

Tomer Koren; Roi Livni; Yishay Mansour

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Bandits with Movement Costs and Adaptive Pricing

【24h】

Bandits with Movement Costs and Adaptive Pricing

机译：具有运动成本和自适应定价的土匪

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We extend the model of Multi-Armed Bandit with unit switching cost to incorporate a metric between the actions. We consider the case where the metric over the actions can be modeled by a complete binary tree, and the distance between two leaves is the size of the subtree of their least common ancestor, which abstracts the case that the actions are points on the continuous interval $[0,1]$ and the switching cost is their distance. In this setting, we give a new algorithm that establishes a regret of $widetilde{O}(sqrt{k}T + T/k)$, where $k$ is the number of actions and $T$ is the time horizon. When the set of actions corresponds to whole $[0,1]$ interval we can exploit our method for the task of bandit learning with Lipschitz loss functions, where our algorithm achieves an optimal regret rate of $widetilde{Θ}(T^2/3)$, which is the same rate one obtains when there is no penalty for movements. As our main application, we use our new algorithm to solve an adaptive pricing problem. Specifically, we consider the case of a single seller faced with a stream of patient buyers. Each buyer has a private value and a window of time in which they are interested in buying, and they buy at the lowest price in the window, if it is below their value. We show that with an appropriate discretization of the prices, the seller can achieve a regret of $widetilde{O}(T^2/3)$ compared to the best fixed price in hindsight, which outperform the previous regret bound of $widetilde{O}(T^3/4)$ for the problem.

机译：我们用单位转换成本扩展了多武装强盗的模型，以在行动之间加入一个指标。我们考虑的情况下，可以通过完整的二叉树来模拟操作的度量，并且两片叶子之间的距离是其最不常见祖先的子树的大小，从而抽象出了操作是连续区间上的点的情况$ [0,1] $，转换成本就是它们的距离。在这种情况下，我们给出了一种新算法，该算法建立了$ widetilde {O}（ sqrt {k} T + T / k）$后悔的算法，其中$ k $是动作数，$ T $是时间地平线。当一组动作对应于整个$ [0,1] $间隔时，我们可以利用Lipschitz损失函数利用我们的方法进行强盗学习任务，其中我们的算法获得$ widetilde {Θ}（T ^ 2/3）$，这与不对移动进行惩罚时获得的汇率相同。作为主要应用，我们使用新算法来解决自适应定价问题。具体来说，我们考虑的是单个卖家面对大量耐心买家的情况。每个买家都有一个私人价值和一个他们有兴趣购买的时间窗口，如果价格低于其价值，他们会以窗口中的最低价格购买。我们显示，通过适当地离散价格，与事后发现的最佳固定价格相比，卖方可以实现$ widetilde {O}（T ^ 2/3）$的后悔，后者优于之前的$ widetilde {O}（T ^ 3/4）$解决问题。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2017年第3期|共27页
作者
Tomer Koren; Roi Livni; Yishay Mansour;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost [J] . Agrawal R., Hedge M.V. IEEE Transactions on Automatic Control . 1988,第10期

机译：具有切换费用的多武装强盗问题的渐近有效自适应分配规则。
2. Standard Thermoplastics Trend February 2014: Price movements follow costs / Styrenics tending weaker [J] . plastics information europe Group plastics information europe . 2014,第901期

机译：标准热塑性塑料趋势，2014年2月：价格随成本而变化/苯乙烯市场趋弱
3. Steel markets--Long products: Scrap costs drive price movements amid sleepy demand [J] . CRU Monitor: Steel Metallics: Scrap, Dri & Pig Iron . 2010,第Octa期

机译：钢铁市场-长材产品：需求疲软，废钢成本推动价格走势
4. WHAT IS THE WELFARE SOCIAL COST OF OIL PRICE MOVEMENTS? [C] . Marc Jo s, Tovonony Razafindrabe IAEE international conference;International Association for Energy Economics . 2014

机译：石油价格运动的福利社会成本是多少？
5. Adaptive Preference Learning with Bandit Feedback: Information Filtering, Dueling Bandits and Incentivizing Exploration [D] . Chen, Bangrui. 2017

机译：带有土匪反馈的自适应偏好学习：信息过滤，决斗土匪和激励探索
6. Gazing into Thin Air: The Dual-Task Costs of Movement Planning and Execution during Adaptive Gait [O] . Toby J. Ellmers, Adam J. Cocks, Michail Doumas, -1

机译：凝视着天空：自适应步态期间运动计划和执行的双重任务成本
7. Asymptotically efficient adaptive allocation rules for the multi-armed bandit problem with switching cost [O] . Rajeev Agrawal, Manjunath V, Demosthenis Teneketzis 1988

机译：具有切换成本的多臂强盗问题的渐近有效自适应分配规则

Bandits with Movement Costs and Adaptive Pricing

摘要

著录项

相似文献

相关主题

期刊订阅