首页> 外文期刊>Expert systems with applications >Dynamic Packaging In E-retailing With Stochastic Demand Over Finite Horizons: A Q-learning Approach
【24h】

Dynamic Packaging In E-retailing With Stochastic Demand Over Finite Horizons: A Q-learning Approach

机译:具有有限需求的随机需求的电子零售中的动态包装:一种Q学习方法

获取原文
获取原文并翻译 | 示例

摘要

This paper investigates how intelligent an agent may utilize a Q-learning approach, a simulation-based stochastic technique, to make optimal dynamic packaging decision in e-retailing setting. When the practical application of dynamic packaging involves a large number of products, normal Q-learning approach would encounter two major problems due to excessively large state space. First, learning the Q-values in tabular form may be infeasible because of the excessive amount of memory needed to store the table. Second, rewards in the state space may be so sparse that with random exploration they will only be discovered extremely slowly. This paper first describes the state-dependent and event-driven nature of the dynamic packaging problem with a Markov decision process model, then proposes a states generalization approach based on distortion measure, and finally puts forward a heuristic based exploration/exploitation policy which is used to improve the convergence in Q-learning. We validate our approach in a simulated test.
机译:本文研究了智能代理商如何利用Q学习方法(一种基于模拟的随机技术)在电子零售环境中做出最佳动态包装决策。当动态包装的实际应用涉及大量产品时,由于状态空间过大,常规的Q学习方法会遇到两个主要问题。首先,以表格形式学习Q值可能不可行,因为存储表需要过多的内存。其次,状态空间中的奖励可能会如此稀疏,以至于通过随机探索它们只会非常缓慢地被发现。本文首先利用马尔可夫决策过程模型描述了动态包装问题的状态依赖和事件驱动性质,然后提出了一种基于失真度量的状态归纳方法,最后提出了一种基于启发式的探索/开发策略。改善Q学习的收敛性。我们在模拟测试中验证了我们的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号