首页> 外文期刊>IEEE Transactions on Fuzzy Systems >Combination of Online Clustering and Q-Value Based GA for Reinforcement Fuzzy System Design
【24h】

Combination of Online Clustering and Q-Value Based GA for Reinforcement Fuzzy System Design

机译:在线聚类与基于Q值的遗传算法相结合的加固模糊系统设计

获取原文
获取原文并翻译 | 示例
       

摘要

This paper proposes a combination of online clustering and Q-value based genetic algorithm (GA) learning scheme for fuzzy system design (CQGAF) with reinforcements. The CQGAF fulfills GA-based fuzzy system design under reinforcement learning environment where only weak reinforcement signals such as "success" and "failure" are available. In CQGAF, there are no fuzzy rules initially. They are generated automatically. The precondition part of a fuzzy system is online constructed by an aligned clustering-based approach. By this clustering, a flexible partition is achieved. Then, the consequent part is designed by Q-value based genetic reinforcement learning. Each individual in the GA population encodes the consequent part parameters of a fuzzy system and is associated with a Q-value. The Q-value estimates the discounted cumulative reinforcement information performed by the individual and is used as a fitness value for GA evolution. At each time step, an individual is selected according to the Q-values, and then a corresponding fuzzy system is built and applied to the environment with a critic received. With this critic, Q-learning with eligibility trace is executed. After each trial, GA is performed to search for better consequent parameters based on the learned Q-values. Thus, in CQGAF, evolution is performed immediately after the end of one trial in contrast to general GA where many trials are performed before evolution. The feasibility of CQGAF is demonstrated through simulations in cart-pole balancing, magnetic levitation, and chaotic system control problems with only binary reinforcement signals.
机译:本文提出了一种基于在线聚类和基于Q值的遗传算法(GA)学习方案的组合,用于具有增强功能的模糊系统设计(CQGAF)。 CQGAF在强化学习环境下完成了基于GA的模糊系统设计,在强化学习环境中,只有微弱的强化信号(例如“成功”和“失败”)可用。在CQGAF中,最初没有模糊规则。它们是自动生成的。模糊系统的前提部分是通过基于聚类的对齐方法在线构建的。通过此群集,可以实现灵活的分区。然后,通过基于Q值的遗传强化学习设计结果部分。 GA群体中的每个人都对模糊系统的后续部分参数进行编码,并与Q值关联。 Q值估计由个体执行的折现累积强化信息,并用作GA进化的适应度值。在每个时间步骤,根据Q值选择一个人,然后建立一个相应的模糊系统,并在收到评论者的情况下将其应用于环境。通过该批评者,执行了具有资格跟踪的Q学习。每次试验后,将执行GA以根据学习到的Q值搜索更好的后续参数。因此,与常规GA相比,在CQGAF中,一项试验结束后立即进行了进化,而普通GA则是在演化之前进行了许多试验。 CQGAF的可行性通过仅用二进制补强信号进行的磁极平衡,磁悬浮和混沌系统控制问题的仿真证明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号