首页> 外文会议>Annual conference on Neural Information Processing Systems >Approximate Dynamic Programming Finally Performs Well in the Game of Tetris
【24h】

Approximate Dynamic Programming Finally Performs Well in the Game of Tetris

机译:近似动态编程最终在俄罗斯的游戏中表现良好

获取原文

摘要

Tetris is a video game that has been widely used as a benchmark for various optimization techniques including approximate dynamic programming (ADP) algorithms. A look at the literature of this game shows that while ADP algorithms that have been (almost) entirely based on approximating the value function (value function based) have performed poorly in Tetris, the methods that search directly in the space of policies by learning the policy parameters using an optimization black box, such as the cross entropy (CE) method, have achieved the best reported results. This makes us conjecture that Tetris is a game in which good policies are easier to represent, and thus, learn than their corresponding value functions. So, in order to obtain a good performance with ADP, we should use ADP algorithms that search in a policy space, instead of the more traditional ones that search in a value function space. In this paper, we put our conjecture to test by applying such an ADP algorithm, called classification-based modified policy iteration (CBMPI), to the game of Tetris. Our experimental results show that for the first time an ADP algorithm, namely CBMPI, obtains the best results reported in the literature for Tetris in both small 10 × 10 and large 10 × 20 boards. Although the CBMPI's results are similar to those of the CE method in the large board, CBMPI uses considerably fewer (almost 1/6) samples (calls to the generative model) than CE.
机译:TETRIS是一种视频游戏,被广泛用作各种优化技术的基准,包括近似动态编程(ADP)算法。看看这个游戏的文献表明,虽然已经基于近似值(价值函数基于值)的ADP算法在TETRIS中执行得不好,但通过学习直接搜索政策空间的方法使用优化黑匣子的策略参数,例如跨熵(CE)方法,已经取得了最佳报告的结果。这使我们猜想TETRIS是一种游戏,其中良好的政策更容易代表,因此,而不是其相应的价值函数。因此,为了使用ADP获得良好的性能,我们应该使用在策略空间中搜索的ADP算法,而不是在价值函数空间中搜索的更传统的算法。在本文中,我们通过应用这种ADP算法,将被称为基于分类的修改策略迭代(CBMPI)的ADP算法进行了测试,以进入俄罗斯的游戏。我们的实验结果表明,由于第一次ADP算法,即CBMPI,在小型10×10和大型10×20板上的俄罗斯方块的文献中获得了最佳结果。尽管CBMPI的结果与大板中CE方法的结果类似,但CBMPI使用比CE相当少(几乎1/6)样本(对生成模型的调用)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号