Reinforcement learning based automated history matching for improved hydrocarbon production forecast

Li Hao; Misra Siddharth

摘要

History matching aims to find a numerical reservoir model that can be used to predict the reservoir performance. An engineer and model calibration (data inversion) method are required to adjust various parameters/properties of the numerical model in order to match the reservoir production history. In this study, we develop deep neural networks within the reinforcement learning framework to achieve automated history matching that will reduce engineers' efforts, human bias, automatically and intelligently explore the parameter space, and remove the need of large set of labeled training data. To that end, a fast-marching-based reservoir simulator is encapsulated as an environment for the proposed reinforcement learning. The deep neural-network-based learning agent interacts with the reservoir simulator within reinforcement learning framework to achieve the automated history matching. Reinforcement learning techniques, such as discrete Deep Q Network and continuous Deep Deterministic Policy Gradients, are used toth, used to train the learning agents. The continuous actions enable the Deep Deterministic Policy Gradients to explore more states at each iteration in a a learning episode; consequently, a better history matching is achieved using this algorithm as compared to Deep Q Network. For simplified dual-target composite reservoir models, the best history-matching performances of the discrete and continuous learning methods in terms of normalized root mean square errors are 0.0447 and 0.0038, respectively. Our study shows that continuous action space achieved by the deep deterministic policy gradient drastically outperforms deep Q network.

机译：历史匹配旨在找到一个数字储层模型，可用于预测储层性能。需要工程师和模型校准（数据反转）方法来调整数值模型的各种参数/属性以匹配储层生产历史。在这项研究中，我们在加强学习框架内开发深度神经网络，以实现自动化历史匹配，这将减少工程师的努力，人类偏见，自动和智能地探索参数空间，并消除大量标记训练数据的需要。为此，基于快速行进的储层模拟器被封装为拟议的增强学习的环境。基于深度神经网络的学习代理商与加固学习框架内的储库模拟器相互作用，以实现自动化历史匹配。加强学习技术，例如离散的深度Q网络和连续深度确定性政策梯度，用于培训学习代理。连续动作使得深度确定性政策梯度可以在学习集中探讨每个迭代的更多状态;因此，与深Q网络相比，使用该算法实现了更好的历史匹配。对于简化的双目标复合储层模型，在归一化的根均方误差方面的离散和连续学习方法的最佳历史匹配性能分别为0.0447和0.0038。我们的研究表明，深度确定性政策梯度实现的连续动作空间急剧优于深度Q网络。

Reinforcement learning based automated history matching for improved hydrocarbon production forecast

摘要

著录项

相关主题

期刊订阅