首页> 外文期刊>Expert Systems with Application >Continuous control with Stacked Deep Dynamic Recurrent Reinforcement Learning for portfolio optimization
【24h】

Continuous control with Stacked Deep Dynamic Recurrent Reinforcement Learning for portfolio optimization

机译:利用堆叠式深度动态递归强化学习进行持续控制,以优化产品组合

获取原文
获取原文并翻译 | 示例
           

摘要

Recurrent reinforcement learning (RRL) techniques have been used to optimize asset trading systems and have achieved outstanding results. However, the majority of the previous work has been dedicated to systems with discrete action spaces. To address the challenge of continuous action and multi-dimensional state spaces, we propose the so called Stacked Deep Dynamic Recurrent Reinforcement Learning (SDDRRL) architecture to construct a real-time optimal portfolio. The algorithm captures the up-to-date market conditions and rebalances the portfolio accordingly. Under this general vision, Sharpe ratio, which is one of the most widely accepted measures of risk-adjusted returns, has been used as a performance metric. Additionally, the performance of most machine learning algorithms highly depends on their hyperparameter settings. Therefore, we equipped SDDRRL with the ability to find the best possible architecture topology using an automated Gaussian Process (GP) with Expected Improvement (El) as an acquisition function. This allows us to select the best architectures that maximizes the total return while respecting the cardinality constraints. Finally, our system was trained and tested in an online manner for 20 successive rounds with data for ten selected stocks from different sectors of the S&P 500 from January 1st, 2013 to July 31st, 2017. The experiments reveal that the proposed SDDRRL achieves superior performance compared to three benchmarks: the rolling horizon Mean-Variance Optimization (MVO) model, the rolling horizon risk parity model, and the uniform buy-and-hold (UBAH) index. (C) 2019 Elsevier Ltd. All rights reserved.
机译:循环强化学习(RRL)技术已用于优化资产交易系统,并取得了出色的成果。但是,先前的大部分工作都是针对具有离散动作空间的系统。为了解决连续动作和多维状态空间的挑战,我们提出了所谓的堆叠式深度动态递归强化学习(SDDRRL)体系结构,以构建实时的最佳组合。该算法捕获最新的市场状况并相应地重新平衡投资组合。在这一总体愿景下,夏普比率是一种最广泛接受的风险调整后收益衡量指标,已被用作绩效指标。此外,大多数机器学习算法的性能在很大程度上取决于其超参数设置。因此,我们为SDDRRL配备了使用自动高斯过程(GP)和预期改进(El)作为获取功能来查找最佳架构拓扑的能力。这使我们能够选择最佳的体系结构,在遵守基数约束的同时最大化总回报。最后,从2013年1月1日至2017年7月31日,我们的系统通过在线方式连续20轮进行了培训和测试,其中包含来自标准普尔500指数不同行业的10只选定股票的数据。实验表明,拟议的SDDRRL具有出色的性能与三个基准进行比较:滚动水平均方差优化(MVO)模型,滚动水平风险平价模型和统一购买与持有(UBAH)指数。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号