首页> 外文会议>International Conference on Machine Learning >When Demands Evolve Larger and Noisier: Learning and Earning in a Growing Environment
【24h】

When Demands Evolve Larger and Noisier: Learning and Earning in a Growing Environment

机译:当需求发展更大而嘈杂时:在不断增长的环境中学习和赚取

获取原文

摘要

We consider a single-product dynamic pricing problem under a specific non-stationary setting, where the underlying demand process grows over time in expectation and also possibly in the level of random fluctuation. The decision maker sequentially sets price in each time period and learns the unknown demand model, with the goal of maximizing expected cumulative revenue over a time horizon T. We prove matching upper and lower bounds on regret and provide near-optimal pricing policies. We show how the growth rate of random fluctuation over time affects the best achievable regret order and the near-optimal policy design. In the analysis, we show that whether the seller knows the length of time horizon T in advance or not surprisingly render different optimal regret orders. We then extend the demand model such that the optimal price may vary with time and present a novel and near-optimal policy for the extended model. Finally, we consider an analogous non-stationary setting in the canonical multi-armed bandit problem, and points out that knowing or not knowing the length of time horizon T render the same optimal regret order, in contrast to the non-stationary dynamic pricing problem.
机译:我们考虑了一个特定的非静止设置下的单产品动态定价问题,其中潜在的需求过程随着时间的推移而增长,并且可能在随机波动的水平。决策者顺序地在每次期间依次设定价格并学习未知的需求模型,目标是在时间范围内最大化预期的累积收入。我们在后悔并提供近乎最佳的定价政策。我们展示了随着时间的推移随机波动的增长率如何影响最佳可实现的遗憾令和近乎最佳的政策设计。在分析中,我们表明卖方是否认识到预先知道时间的时间长度,并不令人惊讶地提出不同的最佳遗憾命令。然后,我们扩展了需求模型,使得最佳价格可能随时间而变化,并为扩展模型提出了一种新颖和近最优的政策。最后,我们考虑了规范多武装强盗问题的类似的非静止设置,并指出了知道或不知道时间的长度,与非静止动态定价问题相比,呈现相同的最佳遗憾令。 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号