...
首页> 外文期刊>高分子論文集 >Improving multi-armed bandit algorithms in online pricing settings
【24h】

Improving multi-armed bandit algorithms in online pricing settings

机译:在在线定价环境中改进多臂强盗算法

获取原文
获取原文并翻译 | 示例
           

摘要

The design of effective bandit algorithms to learn the optimal price is a task of extraordinary importance in all the settings in which the demand curve is not a priori known and the estimation process takes a long time, as customary, e.g., in e-commerce scenarios. In particular, the adoption of effective pricing algorithms may allow companies to increase their profits dramatically. In this paper, we exploit the structure of the pricing problem in online scenarios to improve the performance of state-of-the-art general-purpose bandit algorithms. More specifically, we make use of the monotonicity of the customer demand curve, which suggests the same behavior of the conversion rates, and we exploit the fact that, in many scenarios, companies have a priori information about the order of magnitude of the conversion rate. We design techniques applicable in principle to any bandit algorithm capable of exploiting these two properties, and we apply them to Upper Confidence Bound policies both in stationary and nonstationary environments. We show that algorithms exploiting these two properties may significantly outperform state-of-the-art bandit policies in most of the configurations and we also show that the improvement increases as the number of arms increases. In particular, simulations based on real-world data show that our algorithms may increase the profit by 300% or more when compared to the performance achieved by state-of-the-art bandit algorithms. Furthermore, we formally prove that the empirical improvement provided by our algorithms can be achieved without incurring any cost in terms of theoretical guarantees. Indeed, our algorithms present the same asymptotic worst-case regret bounds of the bandit algorithms previously known in the state of the art. (C) 2018 Elsevier Inc. All rights reserved.
机译:在并非先验已知需求曲线且估算过程需要很长时间的所有情况下,设计学习有效价格的强盗算法是非常重要的任务,例如在电子商务场景中。特别是,采用有效的定价算法可以使公司大幅提高利润。在本文中,我们利用在线方案中的定价问题的结构来提高最新型通用盗匪算法的性能。更具体地说,我们利用客户需求曲线的单调性,它暗示了转换率的相同行为,并且我们利用以下事实:在许多情况下,公司都具有有关转换率量级的先验信息。 。我们设计原则上适用于能够利用这两个属性的任何强盗算法的技术,并将其应用于固定和非固定环境中的上置信界策略。我们展示了利用这两个属性的算法在大多数配置中可能会明显优于最新的强盗策略,并且我们还表明,随着武器数量的增加,改进也随之增加。特别是,基于现实世界数据的模拟表明,与最新的强盗算法所实现的性能相比,我们的算法可以将利润提高300%或更多。此外,我们正式证明了我们的算法所提供的经验改进可以在理论上不付出任何代价的情况下实现。确实,我们的算法呈现出与现有技术中先前已知的强盗算法相同的渐近最坏情况后悔界限。 (C)2018 Elsevier Inc.保留所有权利。

著录项

  • 来源
    《高分子論文集》 |2018年第7期|196-235|共40页
  • 作者单位

    Politecn Milan, Dipartimento Elettron Inforrnaz & Bioingn, Piazza Leonardo da Vinci 32, I-20133 Milan, Italy;

    Politecn Milan, Dipartimento Elettron Inforrnaz & Bioingn, Piazza Leonardo da Vinci 32, I-20133 Milan, Italy;

    Politecn Milan, Dipartimento Elettron Inforrnaz & Bioingn, Piazza Leonardo da Vinci 32, I-20133 Milan, Italy;

    Politecn Milan, Dipartimento Elettron Inforrnaz & Bioingn, Piazza Leonardo da Vinci 32, I-20133 Milan, Italy;

  • 收录信息 美国《科学引文索引》(SCI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Multi-Armed Bandit; Pricing; Nonstationary MAB;

    机译:多武装强盗;定价;非平稳MAB;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号