...
首页> 外文期刊>IEEE Transactions on Communications >Function Approximation Based Reinforcement Learning for Edge Caching in Massive MIMO Networks
【24h】

Function Approximation Based Reinforcement Learning for Edge Caching in Massive MIMO Networks

机译:基于功能近似基于MIMO网络边缘高速缓存的增强学习

获取原文
获取原文并翻译 | 示例

摘要

Caching popular contents in advance is an important technique to achieve low latency and reduced backhaul congestion in future wireless communication systems. In this article, a multi-cell massive multi-input-multi-output system is considered, where locations of base stations are distributed as a Poisson point process. Assuming probabilistic caching, average success probability (ASP) of the system is derived for a known content popularity (CP) profile, which in practice is time-varying and unknown in advance. Further, modeling CP variations across time as a Markov process, reinforcement Q-learning is employed to learn the optimal content placement strategy to optimize the long-term-discounted ASP and average cache refresh rate. In the Q-learning, the number of Q-updates are large and proportional to the number of states and actions. To reduce the space complexity and update requirements towards scalable Q-learning, two novel (linear and non-linear) function approximations-based Q-learning approaches are proposed, where only a constant (4 and 3 respectively) number of variables need updation, irrespective of the number of states and actions. Convergence of these approximation-based approaches are analyzed. Simulations verify that these approaches converge and successfully learn the similar best content placement, which shows the successful applicability and scalability of the proposed approximated Q-learning schemes.
机译:高速缓存流行内容提前是实现低延迟和未来的无线通信系统中重新回程拥塞的重要技术。在本文中,考虑了一种多电池大量多输入多输出系统,其中基站的位置分配为泊松点过程。假设概率高速缓存,系统的平均成功概率(ASP)被导出用于已知的内容人气(CP)简档,这在实践中预先是时变且未知。此外,采用增强型Q-Learning在时间作为Markov过程的CP变化来学习最佳内容放置策略,以优化长期折扣的ASP和平均缓存刷新率。在Q-Learning中,Q-更新的数量很大,与状态数量和动作成比例。为了降低可扩展Q学习的空间复杂性和更新要求,提出了两种新(线性和非线性)函数近似的基于Q学习方法,其中仅常数(分别为4和3)变量的变量,无论国家和行动的数量如何。分析了基于近似的方法的收敛性。仿真验证了这些方法是否会聚并成功学习类似的最佳内容放置,其显示了所提出的近似Q学习方案的成功适用性和可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号