首页> 外文期刊>IEEE transactions on evolutionary computation >Accuracy-Based Learning Classifier Systems for Multistep Reinforcement Learning: A Fuzzy Logic Approach to Handling Continuous Inputs and Learning Continuous Actions
【24h】

Accuracy-Based Learning Classifier Systems for Multistep Reinforcement Learning: A Fuzzy Logic Approach to Handling Continuous Inputs and Learning Continuous Actions

机译:基于精度的多步强化学习分类器系统:处理连续输入和学习连续动作的模糊逻辑方法

获取原文
获取原文并翻译 | 示例
           

摘要

Despite their proven effectiveness, many Michigan learning classifier systems (LCSs) cannot perform multistep reinforcement learning in continuous spaces. To meet this technical challenge, some LCSs have been designed to learn fuzzy logic rules. They can be largely classified into strength-based and accuracy-based systems. The latter is gaining more research attention in the last decade. However, existing accuracy-based learning systems either address primarily single-step learning problems or require the action space to be discrete. In this paper, a new accuracy-based learning fuzzy classifier system (LFCS) is developed to explicitly handle continuous state input and continuous action output during multistep reinforcement learning. Several technical improvements have been achieved while developing the new learning algorithm. Particularly, we have successfully extended Q -learning like credit assignment methods to continuous spaces. To enable direct learning of stochastic strategies for action selection, we have also proposed to use a new fuzzy logic system with stochastic action outputs. Moreover, fine-grained learning of fuzzy rules has been achieved effectively in our algorithm by using a natural gradient learning method. It is the first time that these techniques are utilized substantially in any accuracy-based LFCSs. Meanwhile, in comparison with several recently proposed learning algorithms, our algorithm is shown to perform highly competitively on four benchmark learning problems and a robotics problem. The practical usefulness of our algorithm is also demonstrated by improving the performance of a wireless body area network.
机译:尽管它们被证明有效,但许多密歇根州的学习分类器系统(LCS)不能在连续空间中执行多步骤强化学习。为了应对这一技术挑战,一些LCS已被设计为学习模糊逻辑规则。它们可以大致分为基于强度和基于精度的系统。后者在过去十年中越来越受到研究关注。但是,现有的基于准确性的学习系统要么主要解决单步学习问题,要么要求动作空间是离散的。本文中,开发了一种新的基于精度的学习模糊分类器系统(LFCS),以明确处理多步强化学习期间的连续状态输入和连续动作输出。在开发新的学习算法时,已经实现了多项技术改进。特别是,我们已经成功地将像信用分配方法那样的Q学习扩展到连续空间。为了能够直接学习用于动作选择的随机策略,我们还建议使用具有随机动作输出的新模糊逻辑系统。此外,通过使用自然梯度学习方法,在我们的算法中有效地实现了模糊规则的细粒度学习。这是第一次在任何基于精度的LFCS中充分利用这些技术。同时,与最近提出的几种学习算法相比,我们的算法在四个基准学习问题和一个机器人问题上表现出很高的竞争力。通过改进无线人体局域网的性能,也证明了我们算法的实际实用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号