一种新的基于值函数迁移的快速Sarsa算法

傅启明; 刘全; 尤树华; 黄蔚; 章晓芳

首页> 中文期刊> 《电子学报》 >一种新的基于值函数迁移的快速Sarsa算法

一种新的基于值函数迁移的快速Sarsa算法

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

知识迁移是当前机器学习领域的一个新的研究热点。其基本思想是通过将经验知识从历史任务到目标任务的迁移，达到提高算法收敛速度和收敛精度的目的。针对当前强化学习领域中经典算法收敛速度慢的问题，提出在学习过程中通过迁移值函数信息，减少算法收敛所需要的样本数量，加快算法的收敛速度。基于强化学习中经典的在策略Sarsa算法的学习框架，结合值函数迁移方法，优化算法初始值函数的设置，提出一种新的基于值函数迁移的快速Sarsa算法———VFT-Sarsa 。该算法在执行前期，通过引入自模拟度量方法，在状态空间以及动作空间一致的情况下，对目标任务中的状态与历史任务中的状态之间的距离进行度量，对其中相似并满足一定条件的状态进行值函数迁移，而后再通过学习算法进行学习。将VTF-Sarsa算法用于Random Walk问题，并与经典的Sarsa算法、Q学习算法以及具有较好收敛速度的QV算法进行比较，实验结果表明，该算法在保证收敛精度的基础上，具有更快的收敛速度。%Knowledge Transfer has gradually became a research hot pot in machine learning ,which tries to transfer the knowledge from the historical tasks to the target task in order to speed up the convergence rate and improve the performance of al-gorithms .With respect to the slow convergence rate of traditional reinforcement learning algorithms ,this paper proposed to transfer the value function between different similar learning tasks with the same state space and action space ,which tries to reduce the need-ed samples in the target task and speed up the convergence rate .Based on the framework of on-policy Sarsa algorithm ,combined with the value function transfer method ,this paper put forward a novel fast Sarsa algorithm based on the value function transfer—VFT-Sarsa .At the beginning ,the algorithm uses Bisimulation metric to measure the distance between states in target task and histor-ical task on the condition that these tasks have the same state space and action space ,transfers the value function if the distance meets some condition ,and finally executes the learning algorithm .At the end ,apply the proposed algorithm in Random Walk ,com-pared with Sarsa algorithm ,Q-Learning and QV algorithm ,the results show that the proposed algorithm can get a better convergence rate with a good performance .

著录项

来源
《电子学报》 |2014年第11期|2157-2161|共5页
作者
傅启明; 刘全; 尤树华; 黄蔚; 章晓芳;
展开▼
作者单位

苏州大学计算机科学与技术学院;

江苏苏州 215006;

苏州大学计算机科学与技术学院;

江苏苏州 215006;

吉林大学符号计算与知识工程教育部重点实验室;

吉林长春 130012;

苏州大学计算机科学与技术学院;

江苏苏州 215006;

苏州大学计算机科学与技术学院;

江苏苏州 215006;

苏州大学计算机科学与技术学院;

江苏苏州 215006;

展开▼
原文格式 PDF
正文语种 chi
中图分类自动推理、机器学习;
关键词
强化学习; VFT-Sarsa算法; 自模拟度量; 值函数迁移;

相似文献

中文文献
外文文献
专利

1. 基于值函数迁移的启发式Sarsa算法 [J] . 陈建平 ,杨正霞 ,刘全 . 通信学报 . 2018,第008期
2. 一种新的基于网格编码和区域合并的SAR图像快速分割算法 [J] . 张泽均 ,水鹏朗 . 电子与信息学报 . 2014,第004期
3. 一种基于NSCT域的自适应阈值函数SAR图像去噪 [J] . 彭敏 ,刘文波 ,张弓 . 佳木斯大学学报（自然科学版） . 2009,第006期
4. 一种新的快速RSA算法 [J] . 许万福 ,侯惠芳 . 计算机与数字工程 . 2009,第005期
5. 一种新的组合快速RSA算法 [J] . 王宇洁 ,张晓丹 ,许占文 . 沈阳工业大学学报 . 2001,第003期
6. 基于新阈值函数小波变换的SAR相干斑抑制 [C] . 王雪 ,田微晴 ,胡华超 . 第五届信号和智能信息处理与应用学术会议 . 2011
7. 基于改进SIFT的SAR图像配准及一种新的配准评价方法 [A] . 刘辰 . 2017

一种新的基于值函数迁移的快速Sarsa算法

摘要

著录项

相似文献

相关主题

期刊订阅