...
首页> 外文期刊>Journal of Dynamic Systems, Measurement, and Control >A Real-Time Computational Learning Model for Sequential Decision-Making Problems Under Uncertainty
【24h】

A Real-Time Computational Learning Model for Sequential Decision-Making Problems Under Uncertainty

机译:不确定条件下顺序决策问题的实时计算学习模型

获取原文
获取原文并翻译 | 示例

摘要

Modeling dynamic systems incurring stochastic disturbances for deriving a control policy is a ubiquitous task in engineering. However, in some instances obtaining a model of a system may be impractical or impossible. Alternative approaches have been developed using a simulation-based stochastic framework, in which the system interacts with its environment in real time and obtains information that can be processed to produce an optimal control policy. In this context, the problem of developing a policy for controlling the system's behavior is formulated as a sequential decision-making problem under uncertainty. This paper considers the problem of deriving a control policy for a dynamic system with unknown dynamics in real time, formulated as a sequential decision-making under uncertainty. The evolution of the system is modeled as a controlled Markov chain. A new state-space representation model and a learning mechanism are proposed that can be used to improve system performance over time. The major difference between the existing methods and the proposed learning model is that the latter utilizes an evaluation function, which considers the expected cost that can be achieved by state transitions forward in time. The model allows decision-making based on gradually enhanced knowledge of system response as it transitions from one state to another, in conjunction with actions taken at each state. The proposed model is demonstrated on the single cart-pole balancing problem and a vehicle cruise-control problem.
机译:为获得控制策略而对产生随机干扰的动态系统进行建模是工程中的普遍任务。但是,在某些情况下,获取系统模型可能是不切实际或不可能的。已经使用基于模拟的随机框架开发了替代方法,在该框架中,系统与环境实时交互并获取可以处理以产生最佳控制策略的信息。在这种情况下,将制定控制系统行为的策略的问题表述为不确定性下的顺序决策问题。本文考虑了实时导出动态未知的动态系统的控制策略的问题,该策略被表述为不确定性下的顺序决策。系统的演化被建模为受控马尔可夫链。提出了一种新的状态空间表示模型和学习机制,可用于随着时间的推移提高系统性能。现有方法与建议的学习模型之间的主要区别在于后者利用评估功能,该功能考虑了可以通过状态向前及时转换而实现的预期成本。该模型允许在系统响应从一种状态转换到另一种状态时,结合在每种状态下采取的措施,逐步增强对系统响应的了解,从而做出决策。提出的模型在单车杆平衡问题和车辆巡航控制问题上得到了证明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号