首页> 外文期刊>Expert systems with applications >Continuous-time reinforcement learning approach for portfolio management with time penalization
【24h】

Continuous-time reinforcement learning approach for portfolio management with time penalization

机译:随着时间的惩罚,投资组合管理的连续时间加固学习方法

获取原文
获取原文并翻译 | 示例
           

摘要

This paper considers the problem of policy optimization in the context of continuous-time Reinforcement Learning (RL), a branch of artificial intelligence, for financial portfolio management purposes. The underlying asset portfolio process is assumed to possess a continuous-time discrete-state Markov chain structure involving the simplex and ergodicity constraints. The goal of the portfolio problem is the redistribution of a fund into different financial assets. One general assumption has to be set, namely that the market is arbitrage-free (no price arbitrage is possible) then the problem of how to obtain the optimal policy is solvable. We provide a RL solution based on an actor/critic architecture in which the market is characterized by a restriction called transaction cost, involving time penalization. The portfolio problem in Markov chains is determined by solving a convex quadratic minimization problem with linear constraints. Any Markov chain is generated by a stochastic transition matrices and the mathematical expectations of the rewards. In particular, we estimate the elements of the transition rate matrices and the mathematical expectations of the rewards. This method learns the optimal strategy in order to make a decision on what portfolio weight to take for a single period. With this strategy, the agent is able to choose the state with maximum utility and select its respective action. The optimal policy computation is solved employing a proximal optimization novel approach, which involves time penalization in the transaction costs and the rewards. We employ the Lagrange multipliers approach to include the restrictions of the market and those that are imposed by the continuous time frame. Moreover, a specific numerical example in baking, that fit into the general framework of portfolio, validates the effectiveness and usefulness of the proposed method. (C) 2019 Elsevier Ltd. All rights reserved.
机译:本文考虑了在连续时间增强学习(RL),人工智能分支的情况下,用于金融组合管理目的的政策优化问题。假设潜在的资产投资组合过程具有涉及单纯形和遍历性约束的连续时间离散状态马尔可夫链结构。投资组合问题的目标是将基金重新分配到不同的金融资产中。必须设定一个普遍的假设,即市场是免费的(没有价格套利),那么如何获得最佳政策的问题是可解脱的。我们根据演员/评论仪架构提供RL解决方案,其中市场的特点是涉及交易成本的限制,涉及时间惩罚。通过求解线性约束来确定Markov链条中的投资组合问题是通过求解线性约束的凸二次最小化问题来确定。任何马尔可夫链都是由随机转换矩阵生成的,以及奖励的数学期望。特别是,我们估计过渡率矩阵的元素和奖励的数学期望。该方法了解最佳策略,以便决定单个时期的投资组合重量。使用此策略,代理能够选择具有最大实用程序的状态,然后选择其相应的操作。采用近端优化新颖方法解决了最佳的政策计算,这涉及交易成本和奖励的时间损失。我们采用拉格朗日乘法器方法,包括市场的限制和由连续时间框架施加的限制。此外,烘焙中的特定数值例子,其适合于组合的一般框架,验证了所提出的方法的有效性和有用性。 (c)2019 Elsevier Ltd.保留所有权利。

著录项

  • 来源
    《Expert systems with applications》 |2019年第9期|27-36|共10页
  • 作者单位

    Inst Politecn Nacl Escuela Super Fis & Matemat Bldg 9 UP Adolfo Lopez Mateos Mexico City 07730 DF Mexico|Natl Polytech Inst Sch Phys & Math Mexico City DF Mexico;

    Inst Politecn Nacl Escuela Super Fis & Matemat Bldg 9 UP Adolfo Lopez Mateos Mexico City 07730 DF Mexico|Natl Polytech Inst Sch Phys & Math Mexico City DF Mexico;

    Inst Politecn Nacl Escuela Super Fis & Matemat Bldg 9 UP Adolfo Lopez Mateos Mexico City 07730 DF Mexico|Natl Polytech Inst Sch Phys & Math Mexico City DF Mexico;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Portfolio; Reinforcement learning; Transaction costs; Continuous-time; Markov chains;

    机译:投资组合;加固学习;交易成本;连续时间;马尔可夫链;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号