首页> 外文会议>2011 IEEE Conference on Computational Intelligence and Games >Temporal difference learning with interpolated n-tuples: Initial results from a simulated car racing environment
【24h】

Temporal difference learning with interpolated n-tuples: Initial results from a simulated car racing environment

机译:内插n元组的时差学习:模拟赛车环境的初步结果

获取原文

摘要

Evolutionary algorithms have been used successfully in car racing game competitions, such as the ones based on TORCS. This is in contrast to temporal difference learning (TDL), which despite being a powerful learning algorithm, has not been used to any significant extent within these competitions. We believe that this is mainly due to the difficulty of choosing a good function approximator, the potential instability of the learning behavior (and hence the reliability of the results), and the lack of a forward model which restricts the choice of TDL algorithms. This paper reports our initial results on using a new type of function approximator designed to be used with TDL for problems with a large number of continuous-valued inputs, where function approximators such as multi-layer perceptrons can be unstable. The approach combines interpolated tables with n-tuple systems. In order to conduct the research in a flexible and efficient way we developed a new car-racing simulator that runs much more quickly than TORCS and gives us full access to the forward model of the system. We investigate different types of tracks and physics models, and also make comparisons with human drivers and some initial tests with evolutionary learning (EL). The results show that each approach leads to different driving styles, and either TDL or EL can learn best depending on the details of the environment. Significantly, TDL produced best results when learning state-action values (similar to Q-learning; no forward model needed). Regarding driving style, TDL consistently learned behaviours that avoid damage while EL tended to evolve fast but reckless drivers.
机译:进化算法已成功用于赛车游戏竞赛中,例如基于TORCS的竞赛算法。这与时差学习(TDL)相反,后者虽然是一种功能强大的学习算法,但在这些比赛中并未得到很大程度的利用。我们认为,这主要是由于难以选择好的函数逼近器,学习行为的潜在不稳定性(以及结果的可靠性)以及缺乏限制TDL算法选择的正向模型所致。本文报告了使用新型函数逼近器设计的初步结果,该函数逼近器设计用于与TDL一起解决大量连续值输入的问题,其中诸如多层感知器之类的函数逼近器可能不稳定。该方法将内插表与n元组系统结合在一起。为了以灵活高效的方式进行研究,我们开发了一种新的赛车模拟器,其运行速度比TORCS快得多,并且使我们能够完全访问系统的正向模型。我们研究了不同类型的轨道和物理模型,并与人类驾驶员进行了比较,并通过进化学习(EL)进行了一些初步测试。结果表明,每种方法都会导致不同的驾驶方式,而TDL或EL可以根据环境的细节学习得最好。值得注意的是,当学习状态动作值时,TDL产生了最佳结果(类似于Q学习;不需要正向模型)。关于驾驶风格,TDL始终学习避免损坏的行为,而EL往往会快速发展但鲁drivers驾驶。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号