首页> 外文会议>American Control Conference >Mean First-Passage Time Control Policy versus Reinforcement-Learning Control Policy in Gene Regulatory Networks
【24h】

Mean First-Passage Time Control Policy versus Reinforcement-Learning Control Policy in Gene Regulatory Networks

机译:基因监管网络中的平均第一段时间控制政策与强化学习控制政策

获取原文

摘要

Probabilistic Boolean Networks are rule-based models for gene regulatory networks. They are used to design intervention strategies in translational genomics such as cancer treatment. Previously, methods for finding control policies with the highest effect on steady-state distributions of probabilistic Boolean networks have been proposed. These methods were derived using the theory of infinite-horizon stochastic control. It is well-known that the direct application of optimal control methods is problematic owing to their high computational complexity and the fact that they require the inference of the system model. To bypass the impediment of model estimation, two algorithms for approximating the optimal control policy have been introduced. These algorithms are based on reinforcement learning and mean first-passage times. In this work, the performance of these two methods are compared using both a melanoma-related network and randomly generated networks. It is shown that the mean-first-passage-time-based algorithm outperforms the reinforcement-learning-based algorithm for smaller amount of training data, which corresponds better to feasible experimental conditions. In contrary to the reinforcement-learning-based algorithm, during the learning period of the mean-first-passage-time-based algorithm, the application of control is not required. Intervention in biological systems during the learning phase may induce undesirable side-effects.
机译:概率布尔网络是基于规则的基因监管网络模型。它们用于设计癌症治疗等翻译基因组学中的干预策略。此前,已经提出了用于查找对稳态概率的稳态化布尔网络稳态分布最大效果的方法。使用无限地平线随机控制理论来源的这些方法。众所周知,由于其高计算复杂性以及它们需要推断系统模型的事实,直接应用最佳控制方法是有问题的。为了绕过模型估计的障碍,已经介绍了用于近似最佳控制政策的两个算法。这些算法基于增强学习和平均第一通道时间。在这项工作中,使用黑色素瘤相关的网络和随机产生的网络进行比较这两种方法的性能。结果表明,用于较小量的训练数据的增强基于训练数据的算法优于基于平均第一通道的算法,这对应于可行的实验条件。符合基于加强学习的算法,在学习期间的平均基于通行时间的算法期间,不需要应用控制的应用。在学习阶段期间在生物系统中介绍可能诱导不期望的副作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号