Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage

Moazehi Somayeh; Scott Warren R.; Powell Warren B.

首页> 外文期刊>INFOR: Information Systems and Operational Research >Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage

【24h】

Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage

机译：Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相关主题

摘要

This article studies least-squares approximate policy iteration (API) methods with parametrized value-function approximation. We study several variations of the policy evaluation phase, namely, Bellman error minimization, Bellman error minimization with instrumental variables, projected Bellman error minimization, and projected Bellman error minimization with instrumental variables. For a general discrete-time stochastic control problem, Bellman error minimization policy evaluation using instrumental variables is equivalent to both variants of the projected Bellman error minimization. An alternative to these API methods is direct policy search based on knowledge gradient. The practical performance of these three approximate dynamic programming methods, (i) least squares API with Bellman error minimization, (ii) least squares API with Bellman error minimization with instrumental variables, and (iii) direct policy search, are investigated in the context of an application in energy storage operations management. We create a library of test problems using real-world data and apply value iteration to find their optimal policies. These optimal benchmarks are then used to compare the developed approximate dynamic programming policies. Our analysis indicates that least-squares API with instrumental variables Bellman error minimization prominently outperforms least-squares API with Bellman error minimization. However, these approaches underperform our direct policy search implementation.

著录项

来源
《INFOR: Information Systems and Operational Research》 |2020年第4期|141-166|共26页
作者
Moazehi Somayeh; Scott Warren R.; Powell Warren B.;
展开▼
作者单位

Stevens Inst Technol, Sch Business, Hoboken, NJ 07030 USA;

Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种英语
中图分类计算技术、计算机技术;
关键词
Dynamic programming; approximate dynamic programming; approximate policy iteration; Bellman error minimization; direct policy search; energy storage;
入库时间 2024-01-25 00:31:22

Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage

摘要

著录项

相关主题

期刊订阅