首页>
外国专利>
CONSTRAINED REINFORCEMENT LEARNING NEURAL NETWORK SYSTEMS USING PARETO FRONT OPTIMIZATION
CONSTRAINED REINFORCEMENT LEARNING NEURAL NETWORK SYSTEMS USING PARETO FRONT OPTIMIZATION
展开▼
机译:基于PARETO前沿优化的约束强化学习神经网络系统
展开▼
页面导航
摘要
著录项
相似文献
摘要
A system and method that controls an agent to perform a task subject to one or more constraints. The system trains a preference neural network that learns which preferences produce constraint-satisfying action selection policies. Thus the system optimizes a hierarchical policy that is a product of a preference policy and a preference-conditioned action selection policy. Thus the system learns to jointly optimize a set of objectives relating to rewards and costs received during the task whilst also learning preferences, i.e. trade-offs between the rewards and costs, that are most likely to produce policies that satisfy the constraints.
展开▼