首页> 外文会议>2011 IEEE Conference on Computational Intelligence and Games >Temporal difference learning with interpolated n-tuples: Initial results from a simulated car racing environment

【24h】

Temporal difference learning with interpolated n-tuples: Initial results from a simulated car racing environment

机译：内插n元组的时差学习：模拟赛车环境的初步结果

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Evolutionary algorithms have been used successfully in car racing game competitions, such as the ones based on TORCS. This is in contrast to temporal difference learning (TDL), which despite being a powerful learning algorithm, has not been used to any significant extent within these competitions. We believe that this is mainly due to the difficulty of choosing a good function approximator, the potential instability of the learning behavior (and hence the reliability of the results), and the lack of a forward model which restricts the choice of TDL algorithms. This paper reports our initial results on using a new type of function approximator designed to be used with TDL for problems with a large number of continuous-valued inputs, where function approximators such as multi-layer perceptrons can be unstable. The approach combines interpolated tables with n-tuple systems. In order to conduct the research in a flexible and efficient way we developed a new car-racing simulator that runs much more quickly than TORCS and gives us full access to the forward model of the system. We investigate different types of tracks and physics models, and also make comparisons with human drivers and some initial tests with evolutionary learning (EL). The results show that each approach leads to different driving styles, and either TDL or EL can learn best depending on the details of the environment. Significantly, TDL produced best results when learning state-action values (similar to Q-learning; no forward model needed). Regarding driving style, TDL consistently learned behaviours that avoid damage while EL tended to evolve fast but reckless drivers.

机译：进化算法已成功用于赛车游戏竞赛中，例如基于TORCS的竞赛算法。这与时差学习（TDL）相反，后者虽然是一种功能强大的学习算法，但在这些比赛中并未得到很大程度的利用。我们认为，这主要是由于难以选择好的函数逼近器，学习行为的潜在不稳定性（以及结果的可靠性）以及缺乏限制TDL算法选择的正向模型所致。本文报告了使用新型函数逼近器设计的初步结果，该函数逼近器设计用于与TDL一起解决大量连续值输入的问题，其中诸如多层感知器之类的函数逼近器可能不稳定。该方法将内插表与n元组系统结合在一起。为了以灵活高效的方式进行研究，我们开发了一种新的赛车模拟器，其运行速度比TORCS快得多，并且使我们能够完全访问系统的正向模型。我们研究了不同类型的轨道和物理模型，并与人类驾驶员进行了比较，并通过进化学习（EL）进行了一些初步测试。结果表明，每种方法都会导致不同的驾驶方式，而TDL或EL可以根据环境的细节学习得最好。值得注意的是，当学习状态动作值时，TDL产生了最佳结果（类似于Q学习;不需要正向模型）。关于驾驶风格，TDL始终学习避免损坏的行为，而EL往往会快速发展但鲁drivers驾驶。

著录项

来源
《2011 IEEE Conference on Computational Intelligence and Games 》|2011年|p.321-328|共8页
会议地点
作者
Abdullahi Aisha A.; Lucas Simon M.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术 ; 电子游戏机 ;
关键词

相似文献

外文文献
中文文献
专利

1. Initial inductive learning in a complex computer simulated environment: the role of metacognitive skills and intellectual ability [J] . M. V. J. Veenman, F. J. Prins, J. J. Elshout Computers in Human Behavior . 2002 ,第3期

机译：在复杂的计算机模拟环境中的初步归纳学习：元认知技能和智力的作用
2. Construction In A Simulated Environment Using Temporal Goal Sequencing And Reinforcement Learning [J] . Anand Panangadan, Michael G. Dyer Adaptive Behavior . 2009 ,第1期

机译：基于时间目标排序和强化学习的模拟环境中的构建
3. Differences in head accelerations and physiological demand between live and simulated professional horse racing [J] . C. Quintana, B. Grimshaw, H.E. Rockwood Comparative exercise physiology . 2019 ,第4期

机译：现场和模拟专业赛马的头部加速和生理需求的差异
4. Temporal difference learning with interpolated n-tuples: Initial results from a simulated car racing environment [C] . Abdullahi Aisha A., Lucas Simon M. IEEE Symposium on Computational Intelligence and Games . 2011

机译：与内插N元组的时间差异学习：模拟汽车赛车环境的初始结果
5. The differences in consumer attention when purchasing products packed with kraft vs. simulated kraft substrate in a retail environment [D] . Conlon, Gabrielle 2014

机译：在零售环境中购买装有牛皮纸和模拟牛皮纸基材的产品时，消费者的注意力差异
6. Pharmacy Students’ Perspectives on Interprofessional Learning in a Simulated Patient Care Ward Environment [O] . Louise E. Curley, Maree Jensen, Carolyn McNabb, 2019

机译：在模拟患者护理病房环境中药学专业学生对跨专业学习的看法
7. Point-to-Point Car Racing: an Initial Study of Evolution Versus Temporal Difference Learning [O] . 2008

机译：点对点赛车：进化与时间差异学习的初步研究
8. Studies in Abstraction Learning: Viii. A Simulated Classroom Study of Rectroactive Inhibition as a Function of the Method of Training on the Interpolated Task [R] . Thune, L. E., Long, C. J. 1964

机译：抽象学习研究：Viii。作为插值任务训练方法函数的放射性抑制的模拟课堂研究

Temporal difference learning with interpolated n-tuples: Initial results from a simulated car racing environment

摘要

著录项

相似文献

相关主题

期刊订阅