...
首页> 外文期刊>Journal of Intelligent Systems >Q-Learning Applied to Genetic Algorithm-Fuzzy Approach for On-Line Control in Autonomous Agents
【24h】

Q-Learning Applied to Genetic Algorithm-Fuzzy Approach for On-Line Control in Autonomous Agents

机译:Q学习应用于自主代理中的遗传算法模糊方法

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes an adaptive approach based on a state-action space search for learning and planning an optimal policy with direct reinforcement learning. The optimization of the fuzzy rule-based system is performed by a combination of genetic algorithms and Q-learning, whereby an agent-based predicting machine with desired performance is achieved. The heuristic search method is constructed with the advantage of finding the optimal solution without evaluating the entire state-action space, which gains substantial computational savings for large state-action space. Genetic algorithms are used to generate and select the appropriate small number of subsets of fuzzy If-Then rules and learning of the action-value function (or credit assigned to fuzzy rules) is performed with Q-learning scheme. Selected rules are considered as strings (or individuals) and genetic operations such as selection, crossover, and mutation are applied to them. The experienced trajectory is achieved by learning the action-value function toward the optimal value. The evolution forms an explicit algorithm, which makes decisions that bias the proposed result. The designed evolved autonomous agents are capable of learning and establishing primarily action-effect relations without any prior knowledge either for control problems encountering large number of stimuli. For illustrating the validity of the described technique in control applications, the approach is evaluated on the acrobot task.
机译:本文提出了一种基于国家行动空间搜索的自适应方法,用于学习和规划直接加强学习的最佳政策。基于模糊规则的系统的优化是通过遗传算法和Q学习的组合来执行的,由此实现具有所需性能的基于代理的预测机器。启发式搜索方法是通过在不评估整个状态动作空间的情况下找到最佳解决方案的优势,这增加了大型状态动作空间的实质性计算节省。遗传算法用于生成并选择适当的模糊IF-DOT规则的子集,并使用Q学习方案执行动作值函数(或分配给模糊规则的信用)。所选规则被视为串(或个体)和遗传操作,例如选择,交叉和突变。通过学习动作值函数朝向最佳值来实现经验丰富的轨迹。进化形成明确的算法,这使得偏离所提出的结果的决定。设计的进化的自主代理能够学习并建立主要的动作效应关系,而没有任何先验知识,用于控制大量刺激的控制问题。为了说明控制应用中所描述的技术的有效性,在icrobot任务上评估该方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号