机译:Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage
Stevens Inst Technol, Sch Business, Hoboken, NJ 07030 USA;
Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA;
Dynamic programming; approximate dynamic programming; approximate policy iteration; Bellman error minimization; direct policy search; energy storage;