Improving multi-armed bandit algorithms in online pricing settings

Trovo Francesco; Paladino Stefano; Restelli Marcello; Gatti Nicola

首页> 外文期刊>高分子論文集 >Improving multi-armed bandit algorithms in online pricing settings

【24h】

Improving multi-armed bandit algorithms in online pricing settings

机译：在在线定价环境中改进多臂强盗算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The design of effective bandit algorithms to learn the optimal price is a task of extraordinary importance in all the settings in which the demand curve is not a priori known and the estimation process takes a long time, as customary, e.g., in e-commerce scenarios. In particular, the adoption of effective pricing algorithms may allow companies to increase their profits dramatically. In this paper, we exploit the structure of the pricing problem in online scenarios to improve the performance of state-of-the-art general-purpose bandit algorithms. More specifically, we make use of the monotonicity of the customer demand curve, which suggests the same behavior of the conversion rates, and we exploit the fact that, in many scenarios, companies have a priori information about the order of magnitude of the conversion rate. We design techniques applicable in principle to any bandit algorithm capable of exploiting these two properties, and we apply them to Upper Confidence Bound policies both in stationary and nonstationary environments. We show that algorithms exploiting these two properties may significantly outperform state-of-the-art bandit policies in most of the configurations and we also show that the improvement increases as the number of arms increases. In particular, simulations based on real-world data show that our algorithms may increase the profit by 300% or more when compared to the performance achieved by state-of-the-art bandit algorithms. Furthermore, we formally prove that the empirical improvement provided by our algorithms can be achieved without incurring any cost in terms of theoretical guarantees. Indeed, our algorithms present the same asymptotic worst-case regret bounds of the bandit algorithms previously known in the state of the art. (C) 2018 Elsevier Inc. All rights reserved.

机译：在并非先验已知需求曲线且估算过程需要很长时间的所有情况下，设计学习有效价格的强盗算法是非常重要的任务，例如在电子商务场景中。特别是，采用有效的定价算法可以使公司大幅提高利润。在本文中，我们利用在线方案中的定价问题的结构来提高最新型通用盗匪算法的性能。更具体地说，我们利用客户需求曲线的单调性，它暗示了转换率的相同行为，并且我们利用以下事实：在许多情况下，公司都具有有关转换率量级的先验信息。。我们设计原则上适用于能够利用这两个属性的任何强盗算法的技术，并将其应用于固定和非固定环境中的上置信界策略。我们展示了利用这两个属性的算法在大多数配置中可能会明显优于最新的强盗策略，并且我们还表明，随着武器数量的增加，改进也随之增加。特别是，基于现实世界数据的模拟表明，与最新的强盗算法所实现的性能相比，我们的算法可以将利润提高300％或更多。此外，我们正式证明了我们的算法所提供的经验改进可以在理论上不付出任何代价的情况下实现。确实，我们的算法呈现出与现有技术中先前已知的强盗算法相同的渐近最坏情况后悔界限。（C）2018 Elsevier Inc.保留所有权利。

著录项

来源
《高分子論文集》 |2018年第7期|196-235|共40页
作者
Trovo Francesco; Paladino Stefano; Restelli Marcello; Gatti Nicola;
展开▼
作者单位

Politecn Milan, Dipartimento Elettron Inforrnaz & Bioingn, Piazza Leonardo da Vinci 32, I-20133 Milan, Italy;

Politecn Milan, Dipartimento Elettron Inforrnaz & Bioingn, Piazza Leonardo da Vinci 32, I-20133 Milan, Italy;

Politecn Milan, Dipartimento Elettron Inforrnaz & Bioingn, Piazza Leonardo da Vinci 32, I-20133 Milan, Italy;

Politecn Milan, Dipartimento Elettron Inforrnaz & Bioingn, Piazza Leonardo da Vinci 32, I-20133 Milan, Italy;

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Multi-Armed Bandit; Pricing; Nonstationary MAB;

机译：多武装强盗;定价;非平稳MAB;

相似文献

外文文献
中文文献
专利

1. Improving throughput using multi-armed bandit algorithm for wireless LANs [J] . Kaori Kuroda, Hiroki Kato, Song-Ju Kim, Nonlinear Theory and Its Applications . 2018,第1期

机译：使用多臂强盗算法为无线局域网提高吞吐量
2. Intelligent and Reconfigurable Architecture for KL Divergence-Based Multi-Armed Bandit Algorithms [J] . Santosh S. V. Sai, Darak Sumit J. IEEE transactions on circuits and systems. II, Express briefs . 2021,第3期

机译：基于KL发散的多武装强盗算法的智能和可重构架构
3. Rethinking the Gold Standard With Multi-armed Bandits: Machine Learning Allocation Algorithms for Experiments [J] . Kaibel Chris, Biemann Torsten Organizational Research Methods . 2021,第1期

机译：用多武装燃烧的金标：实验的机器学习分配算法
4. Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting [C] . Jamieson Kevin, Nowak Robert Annual Conference on Information Sciences and Systems . 2014

机译：固定置信度下多臂匪的最佳臂识别算法
5. Offline Evaluation of Multi-Armed Bandit Algorithms Using Bootstrapped Replay on Expanded Data [D] . Dai, Jin. 2021

机译：在扩展数据上使用引导重播的多武装强盗算法的离线评估
6. Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm [O] . Emanuele Cavenaghi, Gabriele Sottocornola, Fabio Stella, 2021

机译：非固定多武装强盗：新概念漂移感知算法的实证评估
7. 0 Online Algorithms for the Multi-Armed Bandit Problem with Markovian Rewards [O] . Cem Tekin, Mingyan Liu 2016

机译：0马尔可夫奖励多武装强盗问题的在线算法

Improving multi-armed bandit algorithms in online pricing settings

摘要

著录项

相似文献

相关主题

期刊订阅