首页>
外国专利>
Device and procedure for training a control strategy for a control device over multiple iterations
Device and procedure for training a control strategy for a control device over multiple iterations
展开▼
机译:用于培训多个迭代控制设备的控制策略的设备和过程
展开▼
页面导航
摘要
著录项
相似文献
摘要
A design describes a procedure to train a control strategy for a control over multiple iterations, defining an exploration strategy for each iteration to an up-to-date version of the control strategy, conducting multiple simulation runs,where, for each simulation run, an action is selected for each state of a sequence of states beginning with an initial state of the simulation run for as long as the selected action is safe according to the exploration strategy,until a secure action has been selected or a maximum number equal to two of actions has been selected, the state of follow-up of the condition following states is determined by simulation when performing the selected action, if a secure action has been selected; orif, until the maximum number is reached in accordance with the strategy, no safe action has been selected, the simulation run is interrupted or a specified safe action is selected, if any;and the state of follow-up of the condition following states is determined by simulation when performing the selected safe action;the sequence of states with the selected actions and rewards received in the states are collected as simulation flow data, for which the iteration of the value of a loss function is determined over the data of the simulation runs performed and the control strategy is adapted to a new version,reducing the loss function value.
展开▼