首页> 外文期刊>Expert Systems with Application >Continuous-time reinforcement learning approach for portfolio management with time penalization
【24h】

Continuous-time reinforcement learning approach for portfolio management with time penalization

机译:带有时间惩罚的资产组合管理的连续时间强化学习方法

获取原文
获取原文并翻译 | 示例

摘要

This paper considers the problem of policy optimization in the context of continuous-time Reinforcement Learning (RL), a branch of artificial intelligence, for financial portfolio management purposes. The underlying asset portfolio process is assumed to possess a continuous-time discrete-state Markov chain structure involving the simplex and ergodicity constraints. The goal of the portfolio problem is the redistribution of a fund into different financial assets. One general assumption has to be set, namely that the market is arbitrage-free (no price arbitrage is possible) then the problem of how to obtain the optimal policy is solvable. We provide a RL solution based on an actor/critic architecture in which the market is characterized by a restriction called transaction cost, involving time penalization. The portfolio problem in Markov chains is determined by solving a convex quadratic minimization problem with linear constraints. Any Markov chain is generated by a stochastic transition matrices and the mathematical expectations of the rewards. In particular, we estimate the elements of the transition rate matrices and the mathematical expectations of the rewards. This method learns the optimal strategy in order to make a decision on what portfolio weight to take for a single period. With this strategy, the agent is able to choose the state with maximum utility and select its respective action. The optimal policy computation is solved employing a proximal optimization novel approach, which involves time penalization in the transaction costs and the rewards. We employ the Lagrange multipliers approach to include the restrictions of the market and those that are imposed by the continuous time frame. Moreover, a specific numerical example in baking, that fit into the general framework of portfolio, validates the effectiveness and usefulness of the proposed method. (C) 2019 Elsevier Ltd. All rights reserved.
机译:本文考虑了在连续时间强化学习(RL)的背景下进行政策优化的问题,这是人工智能的一个分支,用于金融投资组合管理。假定基础资产投资组合流程具有包含单纯形和遍历约束的连续时间离散状态马尔可夫链结构。投资组合问题的目的是将基金重新分配到不同的金融资产中。必须设定一个普遍的假设,即市场是无套利的(不可能进行价格套利),然后可以解决如何获得最优政策的问题。我们提供了基于参与者/评论架构的RL解决方案,其中,市场的特点是称为交易成本的限制,涉及时间处罚。通过求解具有线性约束的凸二次最小化问题来确定马尔可夫链中的投资组合问题。任何马尔可夫链都是由随机转移矩阵和收益的数学期望产生的。特别是,我们估计转换率矩阵的元素和奖励的数学期望。该方法学习最佳策略,以便决定单个时期内要承受的投资组合权重。使用此策略,代理能够选择具有最大效用的状态并选择其各自的操作。采用近端优化新方法解决了最优策略计算问题,该方法涉及交易成本和报酬的时间惩罚。我们采用拉格朗日乘数法来包括市场的限制和连续时间范围所施加的限制。此外,一个适合烘焙产品总框架的具体数字示例验证了所提方法的有效性和实用性。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

  • 来源
    《Expert Systems with Application》 |2019年第9期|27-36|共10页
  • 作者单位

    Inst Politecn Nacl, Escuela Super Fis & Matemat, Bldg 9 UP Adolfo Lopez Mateos, Mexico City 07730, DF, Mexico|Natl Polytech Inst, Sch Phys & Math, Mexico City, DF, Mexico;

    Inst Politecn Nacl, Escuela Super Fis & Matemat, Bldg 9 UP Adolfo Lopez Mateos, Mexico City 07730, DF, Mexico|Natl Polytech Inst, Sch Phys & Math, Mexico City, DF, Mexico;

    Inst Politecn Nacl, Escuela Super Fis & Matemat, Bldg 9 UP Adolfo Lopez Mateos, Mexico City 07730, DF, Mexico|Natl Polytech Inst, Sch Phys & Math, Mexico City, DF, Mexico;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Portfolio; Reinforcement learning; Transaction costs; Continuous-time; Markov chains;

    机译:档案袋;强化学习;交易成本;连续时间;马尔可夫链;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号