首页> 外国专利> OPTIMAL POLICY LEARNING AND RECOMMENDATION FOR DISTRIBUTION TASK USING DEEP REINFORCEMENT LEARNING MODEL

OPTIMAL POLICY LEARNING AND RECOMMENDATION FOR DISTRIBUTION TASK USING DEEP REINFORCEMENT LEARNING MODEL

机译:基于深度强化学习模型的分配任务最优策略学习与推荐

摘要

This disclosure relates to method and system for optimal policy learning and recommendation for distribution task using deep RL model, in applications where when the action space has a probability simplex structure. The method includes training a RL agent by defining a policy network for learning the optimal policy using a policy gradient (PG) method, where the policy network comprising an artificial neural network (ANN) with a set of outputs. A continuous action space having a continuous probability simplex structure is defined. The learning of the optimal policy is updated based on one of stochastic and deterministic PG. For stochastic PG, a Dirichlet distribution based stochastic policy parameterized by output of the ANN with an activation function at an output layer of the ANN is selected. For deterministic PG, a soft-max function is selected as activation function at the output layer of the ANN to maintain the probability simplex structure.
机译:本发明涉及一种方法和系统,用于在行动空间具有概率单纯形结构的应用中,使用深度RL模型对分配任务进行最优策略学习和推荐。该方法包括通过定义用于使用策略梯度(PG)方法学习最优策略的策略网络来训练RL代理,其中策略网络包括具有一组输出的人工神经网络(ANN)。定义了具有连续概率单纯形结构的连续作用空间。最优策略的学习基于随机和确定性PG中的一个进行更新。对于随机PG,选择了基于Dirichlet分布的随机策略,该随机策略由ANN的输出参数化,在ANN的输出层具有激活函数。对于确定性PG,选择一个软最大函数作为ANN输出层的激活函数,以保持概率单纯形结构。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号