首页> 外文会议>Research in computational molecular biology >Proteome Coverage Prediction for Integrated Proteomics Datasets
【24h】

Proteome Coverage Prediction for Integrated Proteomics Datasets

机译:集成蛋白质组学数据集的蛋白质组覆盖率预测

获取原文
获取原文并翻译 | 示例

摘要

Comprehensive characterization of a proteome defines a fundamental goal in proteomics. In order to maximize proteome coverage for a complex protein mixture, i.e. to identify as many proteins as possible, various different fractionation experiments are typically performed and the individual fractions are subjected to mass spectrometric analysis. The resulting data are integrated into large and heterogeneous datasets. Proteome coverage prediction refers to the task of extrapolating the number of protein discoveries by future measurements conditioned on a sequence of already performed measurements. Proteome coverage prediction at an early stage enables experimentalists to design and plan efficient proteomics studies. To date, there does not exist any method that reliably predicts proteome coverage from integrated datasets. We present a generalized hierarchical Pitman-Yor process model that explicitly captures the redundancy within integrated datasets. We assess the proteome coverage prediction accuracy of our approach applied to an integrated proteomics dataset for the bacterium L. interrogans and we demonstrate that it outperforms ad hoc extrapolation methods and prediction methods designed for non-integrated datasets. Furthermore, we estimate the maximally achievable proteome coverage for the experimental setup underlying the L. interrogans dataset. We discuss the implications of our results to determine rational stop criteria and their influence on the design of efficient and reliable proteomics studies.
机译:蛋白质组学的全面表征定义了蛋白质组学的基本目标。为了最大程度地提高复杂蛋白质混合物的蛋白质组覆盖率,即尽可能多地鉴定蛋白质,通常要进行各种不同的分离实验,并对各个部分进行质谱分析。结果数据被集成到大型且异构的数据集中。蛋白质组覆盖率预测是指通过以一系列已执行的测量为条件的未来测量来推断蛋白质发现数量的任务。蛋白质组覆盖率的早期预测使实验人员能够设计和计划有效的蛋白质组学研究。迄今为止,还没有任何方法可以从集成数据集中可靠地预测蛋白质组覆盖率。我们提出了一种通用的分层Pitman-Yor过程模型,该模型明确地捕获了集成数据集中的冗余。我们评估了我们方法应用于蛋白质组学数据集的问号细菌的蛋白质组覆盖率预测的准确性,并且我们证明了它优于专为外推法和非集成数据集设计的预测方法。此外,我们估计了L. interrogans数据集基础上的实验设置的最大可实现的蛋白质组覆盖。我们讨论了确定合理的终止标准及其结果对有效和可靠的蛋白质组学研究设计的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号