首页> 外文会议>IEEE Conference on Computer Communications >Bringing Fairness to Actor-Critic Reinforcement Learning for Network Utility Optimization
【24h】

Bringing Fairness to Actor-Critic Reinforcement Learning for Network Utility Optimization

机译:为行动者批评批评学习提供公平性,用于网络实用程序优化

获取原文

摘要

Fairness is a crucial design objective in virtually all network optimization problems, where limited system resources are shared by multiple agents. Recently, reinforcement learning has been successfully applied to autonomous online decision making in many network design and optimization problems. However, most of them try to maximize the long-term (discounted) reward of all agents, without taking fairness into account. In this paper, we propose a family of algorithms that bring fairness to actorcritic reinforcement learning for optimizing general fairness utility functions. In particular, we present a novel method for adjusting the rewards in standard reinforcement learning by a multiplicative weight depending on both the shape of fairness utility and some statistics of past rewards. It is shown that for proper choice of the adjusted rewards, a policy gradient update converges to at least a stationary point of general αfairness utility optimization. It inspires the design of fairness optimization algorithms in actor-critic reinforcement learning. Evaluations show that the proposed algorithm can be easily deployed in real-world network optimization problems, such as wireless scheduling and video QoE optimization, and can significantly improve the fairness utility value over previous heuristics and learning algorithms.
机译:公平性是几乎所有网络优化问题的重要设计目标,其中有限的系统资源由多个代理共享。最近,加强学习已成功应用于许多网络设计和优化问题的自主在线决策。然而,大多数人都试图最大限度地提高所有代理商的长期(折扣)奖励,而不考虑公平。在本文中,我们提出了一系列算法,将公平性带来了actorcritic强化学习,以优化一般公平实用功能。特别地,我们提出了一种新的方法,用于通过乘法权重调节标准增强学习中的奖励,这取决于公平效用的形状和过去奖励的一些统计数据。 It is shown that for proper choice of the adjusted rewards, a policy gradient update converges to at least a stationary point of general αfairness utility optimization.它激发了演员批评加固学习中公平优化算法的设计。评估表明,所提出的算法可以轻松地部署在现实世界网络优化问题中,例如无线调度和视频QoE优化,并且可以显着提高先前启发式和学习算法的公平公用事业价值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号