首页> 外文期刊>International journal of communication systems >An efficient actor-critic reinforcement learning for device-to-device communication underlaying sectored cellular network
【24h】

An efficient actor-critic reinforcement learning for device-to-device communication underlaying sectored cellular network

机译:用于设备到设备通信的高效演员批评批评学习界面跨越蜂窝网络

获取原文
获取原文并翻译 | 示例
       

摘要

In this paper, a novel reinforcement learning (RL) approach with cell sectoring is proposed to solve the channel and power allocation issue for a device-to-device (D2D)-enabled cellular network when the prior traffic information is not known to the base station (BS). Further, this paper explores an optimal policy for resource and power allocation between users intending to maximize the sum-rate of the overall system. Since the behavior of wireless channel and traffic request of users in the system is stochastic in nature, the dynamic property of the environment allows us to employ an actor-critic RL technique to learn the best policy through continuous interaction with the surrounding. The proposed work comprises of four phases: cell splitting, clustering, queuing model, and channel allocation and power allocation simultaneously using an actor-critic RL. The implementation of cell splitting with novel clustering technique increases the network coverage, reduces co-channel cell interference, and minimizes the transmission power of nodes, whereas the queuing model solves the issue of waiting time for users in a priority-based data transmission. With the help of continuous state-action space, the actor-critic RL algorithm based on policy gradient improves the overall system sum-rate as well as the D2D throughput. The actor adopts a parameter-based stochastic policy for giving continuous action while the critic estimates the policy and criticizes the actor for the action. This reduces the high variance of the policy gradient. Through numerical simulations, the benefit of our resource sharing scheme over other existing traditional scheme is verified.
机译:在本文中,提出了一种新的增强学习(RL)方法,用于求解设备到设备(D2D)的蜂窝网络的信道和功率分配问题,当基础上未知出现之前的交通信息站(BS)。此外,本文探讨了有意如何最大化整个系统的资源和电力分配的最佳策略。由于系统中的用户的无线信道和业务请求的行为本质上是随机的,因此环境的动态属性使我们能够通过与周围的持续交互来学习最佳政策的演员 - 评论家RL技术。所提出的工作包括四个阶段:使用演员 - 评论家RL同时同时进行细胞分离,聚类,排队模型和信道分配和电力分配。具有新型聚类技术的小区分离的实现增加了网络覆盖,减少了共信道小区干扰,并最大限度地减少了节点的传输功率,而排队模型解决了基于优先级的数据传输中用户的等待时间问题。借助连续的状态动作空间,基于策略梯度的演员 - 评论家RL算法可以提高整体系统和速率以及D2D吞吐量。演员采用基于参数的随机政策,以持续行动,同时评论估计政策并批评行动者的行动。这降低了政策梯度的高方差。通过数值模拟,验证了我们资源共享方案对其他现有传统方案的益处。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号