基于核方法的连续动作Actor-Critic学习

陈兴国; 高阳; 范顺国; 俞亚君

首页> 中文期刊> 《模式识别与人工智能》 >基于核方法的连续动作Actor-Critic学习

基于核方法的连续动作Actor-Critic学习

AI论文写作 >>

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In reinforcement learning, the learning algorithms frequently have to deal with both continuous state and continuous action spaces to control accurately. In this paper, the great capacity of kernel method for handling continuous state space problems and the advantage of actor-critic method in dealing with continuous action space problems are combined. Kernel-based continuous-action actor-critic learning ( KCACL ) is proposed grounded on the combination. In KCACL, the actor updates each action probability based on reward-inaction, and the critic updates the state value function according to online selective kernel-based temporal difference( OSKTD) learning. The experimental results demonstrate the effectiveness of the proposed algorithm.%强化学习算法通常要处理连续状态及连续动作空间问题以实现精确控制。就此文中结合Actor-Critic方法在处理连续动作空间的优点及核方法在处理连续状态空间的优势,提出一种基于核方法的连续动作Actor-Critic学习算法( KCACL)。该算法中,Actor根据奖赏不作为原则更新动作概率,Critic采用基于核方法的在线选择时间差分算法学习状态值函数。对比实验验证该算法的有效性。

著录项

来源
《模式识别与人工智能》 |2014年第2期|103-110|共8页
作者
陈兴国; 高阳; 范顺国; 俞亚君;
展开▼
作者单位

南京大学计算机软件新技术国家重点实验室南京210093;

南京大学计算机科学与技术系南京210093;

南京大学计算机软件新技术国家重点实验室南京210093;

南京大学计算机科学与技术系南京210093;

南京大学计算机软件新技术国家重点实验室南京210093;

南京大学计算机科学与技术系南京210093;

南京大学计算机软件新技术国家重点实验室南京210093;

南京大学计算机科学与技术系南京210093;

展开▼
原文格式 PDF
正文语种 chi
中图分类自动推理、机器学习;
关键词
强化学习; 连续动作空间; 函数估计; 核方法;

相似文献

中文文献
外文文献
专利

1. 基于Actor-Critic强化学习的倒立摆智能控制方法 [J] . 邱宇宸 . 武汉冶金管理干部学院学报 . 2018,第004期
2. 基于强化学习Actor-Critic算法的音乐生成 [J] . 白勇 ,齐林 ,帖云 . 计算机应用与软件 . 2020,第005期
3. 基于Tile Coding编码和模型学习的Actor-Critic算法 [J] . 金玉净 ,朱文文 ,伏玉琛 . 计算机科学 . 2014,第006期
4. 基于雷达传感器的连续人体动作识别方法 [J] . 樊争光 ,杨天虹 ,张剑 . 电脑与信息技术 . 2022,第1期
5. 基于经验公式的连续手势动作表面肌电信号识别方法 [J] . 朱旭鹏 ,陈香 ,李云 . 北京生物医学工程 . 2012,第002期
6. MDP基于actor-critic网络的统一NDP方法 [C] . 唐昊 ,陈栋 ,周雷 . 第二十四届中国控制会议 . 2005
7. 连续状态—动作空间下强化学习方法的研究 [A] . 程玉虎 . 2005

基于核方法的连续动作Actor-Critic学习

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅