首页> 外国专利> METHOD, AND CONTROLLER AND CONTROL PROGRAM THEREOF, FOR UPDATING POLICY PARAMETERS UNDER MARKOV DECISION PROCESS SYSTEM ENVIRONMENT

METHOD, AND CONTROLLER AND CONTROL PROGRAM THEREOF, FOR UPDATING POLICY PARAMETERS UNDER MARKOV DECISION PROCESS SYSTEM ENVIRONMENT

机译:马尔可夫决策过程系统环境下更新政策参数的方法,控制方法和程序

摘要

PROBLEM TO BE SOLVED: To implement a function for learning a decision-making model while suppressing an unnecessary increase in mixing time.SOLUTION: A technique for updating a parameter (policy parameter) defining a policy under a Markov decision process system environment includes updating the policy parameter according to an update equation. The update equation includes a term for decreasing a weighted sum (weighted expected hitting time sum) over a first state (s) and a second state (s') of a statistic (expected hitting time function) on the number of steps (hitting time) required to make a first state transition from the first state (s) to the second state (s').
机译:解决的问题:要实现一种学习决策模型的功能,同时又可以避免不必要的混合时间增加。解决方案:一种用于更新在马尔可夫决策过程系统环境下定义策略的参数(策略参数)的技术包括更新根据更新等式的策略参数。更新方程包括用于减少步数(击打时间)的统计量(预期击球时间函数)的第一状态和第二状态(s')的加权和(加权预期击球时间总和)的项),以使第一状态从第一状态转换为第二状态。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号