首页> 外国专利> Rule creation using MDP and inverse reinforcement learning

Rule creation using MDP and inverse reinforcement learning

机译：使用MDP和逆钢筋学习的规则创建

页面导航

摘要
著录项
相似文献

摘要

A method is provided for rule creation that includes receiving (i) a MDP model with a set of states, a set of actions, and a set of transition probabilities, (ii) a policy that corresponds to rules for a rule engine, and (iii) a set of candidate states that can be added to the set of states. The method includes transforming the MDP model to include a reward function using an inverse reinforcement learning process on the MDP model and on the policy. The method includes finding a state from the candidate states, and generating a refined MDP model with the reward function by updating the transition probabilities related to the state. The method includes obtaining an optimal policy for the refined MDP model with the reward function, based on the reward policy, the state, and the updated probabilities. The method includes updating the rule engine based on the optimal policy.

机译：为规则创建提供了一种方法，包括接收（i）具有一组状态的MDP模型，一组动作和一组转换概要，（ii）对应于规则引擎规则的策略，以及（ iii）一组可以添加到各种状态的候选国家。该方法包括将MDP模型转换为使用MDP模型和策略上的逆增强学习过程包括奖励功能。该方法包括从候选状态查找状态，并通过更新与状态相关的转换概率来生成具有奖励功能的精细MDP模型。该方法包括基于奖励策略，状态和更新的概率获取具有奖励函数的精细MDP模型的最佳策略。该方法包括基于最佳策略更新规则引擎。

著录项

公开/公告号US11003998B2

专利类型
公开/公告日2021-05-11

原文格式PDF
申请/专利权人 INTERNATIONAL BUSINESS MACHINES CORPORATION;
展开▼

申请/专利号US201715812002
发明设计人 AKIRA KOSEKI;TETSURO MORIMURA;TOSHIRO TAKASE;HIROKI YANAGISAWA;
展开▼

申请日2017-11-14
分类号G06N5/04;G06N7;G06N5/02;B60W30/18;B60W10/18;B60W10/20;G06F17/16;G06N20;G05D1;
国家 US
入库时间 2022-08-24 18:37:51

相似文献

专利
外文文献
中文文献