首页> 美国卫生研究院文献>Bioinformatics >The RNA Newton polytope and learnability of energy parameters
【2h】

The RNA Newton polytope and learnability of energy parameters

机译:RNA牛顿多态性与能量参数的可获性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: Computational RNA structure prediction is a mature important problem that has received a new wave of attention with the discovery of regulatory non-coding RNAs and the advent of high-throughput transcriptome sequencing. Despite nearly two score years of research on RNA secondary structure and RNA–RNA interaction prediction, the accuracy of the state-of-the-art algorithms are still far from satisfactory. So far, researchers have proposed increasingly complex energy models and improved parameter estimation methods, experimental and/or computational, in anticipation of endowing their methods with enough power to solve the problem. The output has disappointingly been only modest improvements, not matching the expectations. Even recent massively featured machine learning approaches were not able to break the barrier. Why is that?Approach: The first step toward high-accuracy structure prediction is to pick an energy model that is inherently capable of predicting each and every one of known structures to date. In this article, we introduce the notion of learnability of the parameters of an energy model as a measure of such an inherent capability. We say that the parameters of an energy model are learnable iff there exists at least one set of such parameters that renders every known RNA structure to date the minimum free energy structure. We derive a necessary condition for the learnability and give a dynamic programming algorithm to assess it. Our algorithm computes the convex hull of the feature vectors of all feasible structures in the ensemble of a given input sequence. Interestingly, that convex hull coincides with the Newton polytope of the partition function as a polynomial in energy parameters. To the best of our knowledge, this is the first approach toward computing the RNA Newton polytope and a systematic assessment of the inherent capabilities of an energy model. The worst case complexity of our algorithm is exponential in the number of features. However, dimensionality reduction techniques can provide approximate solutions to avoid the curse of dimensionality.>Results: We demonstrated the application of our theory to a simple energy model consisting of a weighted count of A-U, C-G and G-U base pairs. Our results show that this simple energy model satisfies the necessary condition for more than half of the input unpseudoknotted sequence–structure pairs (55%) chosen from the RNA STRAND v2.0 database and severely violates the condition for ∼13%, which provide a set of hard cases that require further investigation. From 1350 RNA strands, the observed 3D feature vector for 749 strands is on the surface of the computed polytope. For 289 RNA strands, the observed feature vector is not on the boundary of the polytope but its distance from the boundary is not more than one. A distance of one essentially means one base pair difference between the observed structure and the closest point on the boundary of the polytope, which need not be the feature vector of a structure. For 171 sequences, this distance is larger than two, and for only 11 sequences, this distance is larger than five.>Availability: The source code is available on .>Contact:
机译:>动机:计算RNA的结构预测是一个成熟的重要问题,随着调节性非编码RNA的发现和高通量转录组测序的出现,它已经引起了新的关注。尽管在RNA二级结构和RNA-RNA相互作用预测方面进行了近两年的研究,但最新算法的准确性仍远远不能令人满意。迄今为止,研究人员提出了越来越复杂的能量模型,并通过实验和/或计算方法改进了参数估计方法,以期使他们的方法具有解决问题的足够能力。令人失望的是,输出只是适度的改进,不符合预期。即使是最近大规模使用的机器学习方法也无法打破障碍。为何:方法:进行高精度结构预测的第一步是选择一个能固有地预测迄今为止每个已知结构的能量模型。在本文中,我们介绍了能量模型参数的可学习性概念,以此作为对这种固有能力的度量。我们说能量模型的参数是可学习的,前提是至少存在一组这样的参数,这些参数使每个已知的RNA结构都具有最小的自由能结构。我们得出了学习性的必要条件,并给出了动态编程算法对其进行评估。我们的算法在给定输入序列的集合中计算所有可行结构的特征向量的凸包。有趣的是,该凸包与作为能量参数的多项式的分隔函数的牛顿多拓扑重合。据我们所知,这是计算RNA牛顿多态性和对能量模型的固有能力进行系统评估的第一种方法。我们算法的最坏情况复杂度是特征数量的指数级增长。但是,降维技术可以提供避免约维诅咒的近似解决方案。>结果:我们证明了我们的理论在由AU,CG和GU碱基对加权计数组成的简单能量模型中的应用。我们的结果表明,这个简单的能量模型满足了从RNA STRAND v2.0数据库中选择的一半以上未输入的未序列化序列-结构对的必要条件(55%),并且严重违反了约13%的条件,从而提供了一系列需要进一步调查的困难案例。从1350条RNA链中,观察到的749条链的3D特征向量在计算出来的多表位的表面上。对于289条RNA链,观察到的特征向量不在多义词的边界上,但其与边界的距离不超过一。一个距离实质上是指观察到的结构与多义峰边界上的最接近点之间的一个碱基对差异,它不必是结构的特征向量。对于171个序列,此距离大于2,而对于仅11个序列,该距离大于5。>可用性:源代码可在。>联系方式:

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号