Continuous-time reinforcement learning approach for portfolio management with time penalization

Garcia-Galicia Mauricio; Carsteanu Alin A.; Clempner Julio B.

首页> 外文期刊>Expert systems with applications >Continuous-time reinforcement learning approach for portfolio management with time penalization

【24h】

Continuous-time reinforcement learning approach for portfolio management with time penalization

机译：随着时间的惩罚，投资组合管理的连续时间加固学习方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper considers the problem of policy optimization in the context of continuous-time Reinforcement Learning (RL), a branch of artificial intelligence, for financial portfolio management purposes. The underlying asset portfolio process is assumed to possess a continuous-time discrete-state Markov chain structure involving the simplex and ergodicity constraints. The goal of the portfolio problem is the redistribution of a fund into different financial assets. One general assumption has to be set, namely that the market is arbitrage-free (no price arbitrage is possible) then the problem of how to obtain the optimal policy is solvable. We provide a RL solution based on an actor/critic architecture in which the market is characterized by a restriction called transaction cost, involving time penalization. The portfolio problem in Markov chains is determined by solving a convex quadratic minimization problem with linear constraints. Any Markov chain is generated by a stochastic transition matrices and the mathematical expectations of the rewards. In particular, we estimate the elements of the transition rate matrices and the mathematical expectations of the rewards. This method learns the optimal strategy in order to make a decision on what portfolio weight to take for a single period. With this strategy, the agent is able to choose the state with maximum utility and select its respective action. The optimal policy computation is solved employing a proximal optimization novel approach, which involves time penalization in the transaction costs and the rewards. We employ the Lagrange multipliers approach to include the restrictions of the market and those that are imposed by the continuous time frame. Moreover, a specific numerical example in baking, that fit into the general framework of portfolio, validates the effectiveness and usefulness of the proposed method. (C) 2019 Elsevier Ltd. All rights reserved.

机译：本文考虑了在连续时间增强学习（RL），人工智能分支的情况下，用于金融组合管理目的的政策优化问题。假设潜在的资产投资组合过程具有涉及单纯形和遍历性约束的连续时间离散状态马尔可夫链结构。投资组合问题的目标是将基金重新分配到不同的金融资产中。必须设定一个普遍的假设，即市场是免费的（没有价格套利），那么如何获得最佳政策的问题是可解脱的。我们根据演员/评论仪架构提供RL解决方案，其中市场的特点是涉及交易成本的限制，涉及时间惩罚。通过求解线性约束来确定Markov链条中的投资组合问题是通过求解线性约束的凸二次最小化问题来确定。任何马尔可夫链都是由随机转换矩阵生成的，以及奖励的数学期望。特别是，我们估计过渡率矩阵的元素和奖励的数学期望。该方法了解最佳策略，以便决定单个时期的投资组合重量。使用此策略，代理能够选择具有最大实用程序的状态，然后选择其相应的操作。采用近端优化新颖方法解决了最佳的政策计算，这涉及交易成本和奖励的时间损失。我们采用拉格朗日乘法器方法，包括市场的限制和由连续时间框架施加的限制。此外，烘焙中的特定数值例子，其适合于组合的一般框架，验证了所提出的方法的有效性和有用性。（c）2019 Elsevier Ltd.保留所有权利。

著录项

来源
《Expert systems with applications》 |2019年第9期|27-36|共10页
作者
Garcia-Galicia Mauricio; Carsteanu Alin A.; Clempner Julio B.;
展开▼
作者单位

Inst Politecn Nacl Escuela Super Fis & Matemat Bldg 9 UP Adolfo Lopez Mateos Mexico City 07730 DF Mexico|Natl Polytech Inst Sch Phys & Math Mexico City DF Mexico;

Inst Politecn Nacl Escuela Super Fis & Matemat Bldg 9 UP Adolfo Lopez Mateos Mexico City 07730 DF Mexico|Natl Polytech Inst Sch Phys & Math Mexico City DF Mexico;

Inst Politecn Nacl Escuela Super Fis & Matemat Bldg 9 UP Adolfo Lopez Mateos Mexico City 07730 DF Mexico|Natl Polytech Inst Sch Phys & Math Mexico City DF Mexico;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Portfolio; Reinforcement learning; Transaction costs; Continuous-time; Markov chains;

机译：投资组合;加固学习;交易成本;连续时间;马尔可夫链;

相似文献

外文文献
中文文献
专利

1. Continuous-time reinforcement learning approach for portfolio management with time penalization [J] . Garcia-Galicia Mauricio, Carsteanu Alin A., Clempner Julio B. Expert Systems with Application . 2019,第SEPa期

机译：带有时间惩罚的资产组合管理的连续时间强化学习方法
2. Continuous-time mean variance portfolio with transaction costs: a proximal approach involving time penalization [J] . Garcia-Galicia M., Carsteanu A. A., Clempner J. B. International journal of general systems . 2019,第1a2期

机译：具有交易成本的连续时间均值方差组合：涉及时间惩罚的近端方法
3. Continuous-time mean variance portfolio with transaction costs: a proximal approach involving time penalization [J] . Garcia-Galicia M., Carsteanu A. A., Clempner J. B. International journal of general systems . 2019,第1a2期

机译：具有交易成本的连续时间平均方差组合：涉及时间惩罚的近端方法
4. Event-triggered reinforcement learning approach for unknown nonlinear continuous-time system [C] . Zhong Xiangnan, Ni Zhen, He Haibo, International Joint Conference on Neural Networks . 2014

机译：未知非线性连续时间系统的事件触发强化学习方法
5. A Smoothing Framework for Stochastic Continuous-Time Reinforcement Learning Problem [D] . Hu, Bowen. 2021

机译：用于随机连续时间增强学习问题的平滑框架
6. Real-Time Task Assignment Approach Leveraging Reinforcement Learning with Evolution Strategies for Long-Term Latency Minimization in Fog Computing [O] . Long Mai, Nhu-Ngoc Dao, Minho Park 2018

机译：实时任务分配方法利用强化学习和演化策略使雾计算中的长期延迟最小化
7. Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework [O] . Haoran Wang, Xunyu Zhou 2019

机译：连续时间平均值 - 方差组合选择：加强学习框架
8. Extending Hierarchical Reinforcement Learning to Continuous-Time, Average-Reward, and Multi-Agent Models [R] . Ghavamzadeh, M. , Mahadevan, S. , Makar, R. 2003

机译：将分层强化学习扩展到连续时间，平均奖励和多智能体模型

Continuous-time reinforcement learning approach for portfolio management with time penalization

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅