首页> 外文期刊>Bioinformatics >Integrative analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: a non-linear model to predict abundance of undetected proteins
【24h】

Integrative analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: a non-linear model to predict abundance of undetected proteins

机译:寻常脱硫弧菌的转录组和蛋白质组学数据的综合分析:一种预测未检测蛋白丰度的非线性模型

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Gene expression pro. ling technologies can generally produce mRNA abundance data for all genes in a genome. A dearth of proteomic data persists because identification range and sensitivity of proteomic measurements lag behind those of transcriptomic measurements. Using partial proteomic data, it is likely that integrative transcriptomic and proteomic analysis may introduce significant bias. Developing methodologies to accurately estimate missing proteomic data will allow better integration of transcriptomic and proteomic datasets and provide deeper insight into metabolic mechanisms underlying complex biological systems.Results: In this study, we present a non-linear data-driven model to predict abundance for undetected proteins using two independent datasets of cognate transcriptomic and proteomic data collected from Desulfovibrio vulgaris. We use stochastic gradient boosted trees (GBT) to uncover possible non-linear relationships between transcriptomic and proteomic data, and to predict protein abundance for the proteins not experimentally detected based on relevant predictors such as mRNA abundance, cellular role, molecular weight, sequence length, protein length, guanine-cytosine (GC) content and triple codon counts. Initially, we constructed a GBT model using all possible variables to assess their relative importance and characterize the behavior of the predictive model. A strong plateau effect in the regions of high mRNA values and sparse data occurred in this model. Hence, we removed genes in those areas based on thresholds estimated from the partial dependency plots where this behavior was captured. At this stage, only the strongest predictors of protein abundance were retained to reduce the complexity of the GBT model. After removing genes in the plateau region, mRNA abundance, main cellular functional categories and few triple codon counts emerged as the top-ranked predictors of protein abundance. We then created a new tuned GBT model using the five most significant predictors. The construction of our non-linear model consists of a set of serial regression trees models with implicit strength in variable selection. The model provides variable relative importance measures using as a criterion mean square error. The results showed that coefficients of determination for our nonlinear models ranged from 0.393 to 0.582 in both datasets, providing better results than linear regression used in the past. We evaluated the validity of this non-linear model using biological information of operons, regulons and pathways, and the results demonstrated that the coefficients of variation of estimated protein abundance values within operons, regulons or pathways are indeed smaller than those for random groups of proteins.
机译:动机:基因表达亲。 ling技术通常可以生成基因组中所有基因的mRNA丰度数据。由于蛋白质组测量的识别范围和灵敏度落后于转录组测量的识别范围和灵敏度,因此仍然缺乏蛋白质组数据。使用部分蛋白质组学数据,综合的转录组学和蛋白质组学分析可能会引入明显的偏倚。开发准确估计缺失的蛋白质组学数据的方法将可以更好地整合转录组学和蛋白质组学数据集,并提供对复杂生物系统潜在代谢机制的更深入的了解。使用从寻常脱硫弧菌收集的同源转录组和蛋白质组学数据的两个独立数据集提取蛋白质。我们使用随机梯度增强树(GBT)来发现转录组和蛋白质组学数据之间可能的非线性关系,并根据相关预测因子(例如mRNA丰度,细胞作用,分子量,序列长度)预测未通过实验检测到的蛋白质的蛋白质丰度,蛋白质长度,鸟嘌呤-胞嘧啶(GC)含量和三重密码子计数。最初,我们使用所有可能的变量构建了一个GBT模型,以评估它们的相对重要性并表征预测模型的行为。在此模型中,在高mRNA值和稀疏数据区域发生了强烈的高原效应。因此,我们根据从捕获这种行为的部分依赖图估计的阈值删除了那些区域中的基因。在此阶段,仅保留最强的蛋白质丰度预测因子,以降低GBT模型的复杂性。去除高原地区的基因后,mRNA丰度,主要的细胞功能类别和很少的三重密码子计数成为蛋白质丰度的最高预测指标。然后,我们使用五个最重要的预测变量创建了一个新的调整后的GBT模型。我们的非线性模型的构建由一组具有变量选择隐式强度的序列回归树模型组成。该模型使用均方误差作为标准来提供可变的相对重要性度量。结果表明,在两个数据集中,我们的非线性模型的确定系数范围从0.393到0.582,比过去使用的线性回归提供了更好的结果。我们使用操纵子,调节子和途径的生物学信息评估了该非线性模型的有效性,结果表明,操纵子,调节子或途径内估计的蛋白质丰度值的变异系数确实小于随机蛋白质组。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号