首页> 外国专利> OPTIMAL POLICY LEARNING AND RECOMMENDATION FOR DISTRIBUTION TASK USING DEEP REINFORCEMENT LEARNING MODEL

OPTIMAL POLICY LEARNING AND RECOMMENDATION FOR DISTRIBUTION TASK USING DEEP REINFORCEMENT LEARNING MODEL

机译：基于深度强化学习模型的分配任务最优策略学习与推荐

页面导航

摘要
著录项
相似文献

摘要

This disclosure relates to method and system for optimal policy learning and recommendation for distribution task using deep RL model, in applications where when the action space has a probability simplex structure. The method includes training a RL agent by defining a policy network for learning the optimal policy using a policy gradient (PG) method, where the policy network comprising an artificial neural network (ANN) with a set of outputs. A continuous action space having a continuous probability simplex structure is defined. The learning of the optimal policy is updated based on one of stochastic and deterministic PG. For stochastic PG, a Dirichlet distribution based stochastic policy parameterized by output of the ANN with an activation function at an output layer of the ANN is selected. For deterministic PG, a soft-max function is selected as activation function at the output layer of the ANN to maintain the probability simplex structure.

机译：本发明涉及一种方法和系统，用于在行动空间具有概率单纯形结构的应用中，使用深度RL模型对分配任务进行最优策略学习和推荐。该方法包括通过定义用于使用策略梯度（PG）方法学习最优策略的策略网络来训练RL代理，其中策略网络包括具有一组输出的人工神经网络（ANN）。定义了具有连续概率单纯形结构的连续作用空间。最优策略的学习基于随机和确定性PG中的一个进行更新。对于随机PG，选择了基于Dirichlet分布的随机策略，该随机策略由ANN的输出参数化，在ANN的输出层具有激活函数。对于确定性PG，选择一个软最大函数作为ANN输出层的激活函数，以保持概率单纯形结构。

著录项

公开/公告号US2022083842A1

专利类型
公开/公告日2022-03-17

原文格式PDF
申请/专利权人 TATA CONSULTANCY SERVICES LIMITED;
展开▼

申请/专利号US202117213333
发明设计人 AVINASH ACHAR;EASWARA SUBRAMANIAN;SANJAY PURUSHOTTAM BHAT;VIGNESH LAKSHMANAN KANGADHARAN PALANIRADJA;
展开▼

申请日2021-03-26
分类号G06N3/04;G06N3/08;G06N3/10;
国家 US
入库时间 2022-08-24 23:54:17

相似文献

专利
外文文献
中文文献