首页> 外文期刊>Communications, IET >Weighted cooperative reinforcement learning-based energy-efficient autonomous resource selection strategy for underlay D2D communication
【24h】

Weighted cooperative reinforcement learning-based energy-efficient autonomous resource selection strategy for underlay D2D communication

机译:基于加权协作强化学习的底层D2D通信节能自主资源选择策略

获取原文
获取原文并翻译 | 示例
       

摘要

Underlay Device-to-Device (D2D) communication is a key technology responsible for high data rate, ultra-low latency with high spectral and energy efficiency in 5G cellular networks. But to achieve its full potential, optimal channel allocation and effective co-channel interference management must be accomplished. To address this challenge, we propose a multi-agent reinforcement learning based autonomous channel selection scheme for D2D communication. The proposed scheme, Weighted Cooperative Q-Learning based Resource Selection (WCopQLRS), allows a D2D pair to learn to select a channel from the available resources autonomously. Learning process of each D2D transmitter involves cooperation from neighboring D2D agents by exchanging their latest Q-values. An additional parameter called cooperation range is used to determine the neighboring pairs whose Q-values can be used for learning the optimal policy. The limited prior information prevents a linear increase in the dimensions of Q-value matrix of each learning agent when the number of D2D pairs within the cell is huge. Though WCopQL-RS involves additional information exchange among agents as compared to independent learning but also provides improved system throughput and convergence speed. It is shown through simulation results that WCopQL-RS outperforms other existing schemes in terms of average D2D user throughput, energy consumption and fairness value.
机译:底层设备到设备(D2D)通信是一项关键技术,可在5G蜂窝网络中实现高数据速率,超低延迟以及高频谱和能源效率。但是要发挥其全部潜力,必须实现最佳的信道分配和有效的同信道干扰管理。为了解决这一挑战,我们提出了一种基于多主体强化学习的D2D通信自主通道选择方案。提出的方案,基于加权合作Q学习的资源选择(WCopQLRS),允许D2D对学习从可用资源中自主选择频道。每个D2D发射机的学习过程都涉及到相邻D2D代理通过交换它们最新的Q值进行的合作。称为合作范围的附加参数用于确定其Q值可用于学习最佳策略的相邻对。当单元中的D2D对数量巨大时,有限的先验信息会阻止每个学习代理的Q值矩阵的尺寸线性增加。尽管与独立学习相比,WCopQL-RS涉及代理之间的附加信息交换,但也提供了改进的系统吞吐量和收敛速度。通过仿真结果表明,在平均D2D用户吞吐量,能耗和公平值方面,WCopQL-RS优于其他现有方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号