Approximate Dynamic Programming Finally Performs Well in the Game of Tetris

机译：近似动态编程最终在俄罗斯的游戏中表现良好

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Tetris is a video game that has been widely used as a benchmark for various optimization techniques including approximate dynamic programming (ADP) algorithms. A look at the literature of this game shows that while ADP algorithms that have been (almost) entirely based on approximating the value function (value function based) have performed poorly in Tetris, the methods that search directly in the space of policies by learning the policy parameters using an optimization black box, such as the cross entropy (CE) method, have achieved the best reported results. This makes us conjecture that Tetris is a game in which good policies are easier to represent, and thus, learn than their corresponding value functions. So, in order to obtain a good performance with ADP, we should use ADP algorithms that search in a policy space, instead of the more traditional ones that search in a value function space. In this paper, we put our conjecture to test by applying such an ADP algorithm, called classification-based modified policy iteration (CBMPI), to the game of Tetris. Our experimental results show that for the first time an ADP algorithm, namely CBMPI, obtains the best results reported in the literature for Tetris in both small 10 × 10 and large 10 × 20 boards. Although the CBMPI's results are similar to those of the CE method in the large board, CBMPI uses considerably fewer (almost 1/6) samples (calls to the generative model) than CE.

机译：TETRIS是一种视频游戏，被广泛用作各种优化技术的基准，包括近似动态编程（ADP）算法。看看这个游戏的文献表明，虽然已经基于近似值（价值函数基于值）的ADP算法在TETRIS中执行得不好，但通过学习直接搜索政策空间的方法使用优化黑匣子的策略参数，例如跨熵（CE）方法，已经取得了最佳报告的结果。这使我们猜想TETRIS是一种游戏，其中良好的政策更容易代表，因此，而不是其相应的价值函数。因此，为了使用ADP获得良好的性能，我们应该使用在策略空间中搜索的ADP算法，而不是在价值函数空间中搜索的更传统的算法。在本文中，我们通过应用这种ADP算法，将被称为基于分类的修改策略迭代（CBMPI）的ADP算法进行了测试，以进入俄罗斯的游戏。我们的实验结果表明，由于第一次ADP算法，即CBMPI，在小型10×10和大型10×20板上的俄罗斯方块的文献中获得了最佳结果。尽管CBMPI的结果与大板中CE方法的结果类似，但CBMPI使用比CE相当少（几乎1/6）样本（对生成模型的调用）。

著录项

来源
《Annual conference on Neural Information Processing Systems》|2013年||共9页
会议地点
作者
Victor Gabillon; Mohammad Ghavamzadeh; Bruno Scherrer;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词

相似文献

外文文献
中文文献
专利

1. Approximate Modified Policy Iteration and its Application to the Game of Tetris [J] . Bruno Scherrer, Mohammad Ghavamzadeh, Victor Gabillon, Journal of machine learning research . 2015,第Apr期

机译：近似修改的策略迭代及其在俄罗斯方块游戏中的应用
2. Towards Programmable Network Dynamics, Wi-Fi/WiGig Coordination For Optimal WiGig, Computational Complexity, Incommutability Of The Generalized Capacity, Fixed-Parameter Approximability Of Boolean Min Csps, Game Theoretic Analysis Of Tree Based Referrals [J] . K.N.P. Kumar Advances in Physics Theories and Applications . 2013,第3期

机译：面向可编程网络动力学，Wi-Fi / WiGig协调以实现最佳WiGig，计算复杂性，广义容量不可交换性，布尔Min Csps的固定参数逼近度，基于树的引荐的博弈论分析
3. Data-based approximate optimal control for nonzero-sum games of multi-player systems using adaptive dynamic programming [J] . Jiang He, Zhang Huaguang, Xiao Geyang, Neurocomputing . 2018,第JANa31期

机译：使用自适应动态规划的多玩家系统非零和游戏基于数据的近似最优控制
4. Approximate Dynamic Programming Finally Performs Well in the Game of Tetris [C] . Victor Gabillon, Mohammad Ghavamzadeh, Bruno Scherrer Annual conference on Neural Information Processing Systems . 2013

机译：近似动态编程最终在俄罗斯方块游戏中表现出色
5. Stochastic Dual Dynamic Programming and Backward Approximate Dynamic Programming with Integrated Crossing State Stochastic Models for Wind Power in Energy Storage Optimization [D] . Durante, Joseph L. 2020

机译：随机双动规范和倒退近似动态规划，具有集成交叉状态随机模型的蓄能优化
6. Solving the dynamic ambulance relocation and dispatching problem using approximate dynamic programming [O] . Verena Schmid -1

机译：用近似动态规划解决动态救护车的调动和调度问题
7. Approximate Dynamic Programming Finally Performs Well in the Game of Tetris [O] . Gabillon Victor, Ghavamzadeh Mohammad, Scherrer Bruno 2013

机译：近似动态规划最终在俄罗斯方块游戏中表现良好

Approximate Dynamic Programming Finally Performs Well in the Game of Tetris

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅