...
首页> 外文期刊>BMC Bioinformatics >Harvest: an open-source tool for the validation and improvement of peptide identification metrics and fragmentation exploration
【24h】

Harvest: an open-source tool for the validation and improvement of peptide identification metrics and fragmentation exploration

机译:Harvest:用于验证和改进肽段鉴定指标和片段化探索的开源工具

获取原文

摘要

Background Protein identification using mass spectrometry is an important tool in many areas of the life sciences, and in proteomics research in particular. Increasing the number of proteins correctly identified is dependent on the ability to include new knowledge about the mass spectrometry fragmentation process, into computational algorithms designed to separate true matches of peptides to unidentified mass spectra from spurious matches. This discrimination is achieved by computing a function of the various features of the potential match between the observed and theoretical spectra to give a numerical approximation of their similarity. It is these underlying "metrics" that determine the ability of a protein identification package to maximise correct identifications while limiting false discovery rates. There is currently no software available specifically for the simple implementation and analysis of arbitrary novel metrics for peptide matching and for the exploration of fragmentation patterns for a given dataset. Results We present Harvest: an open source software tool for analysing fragmentation patterns and assessing the power of a new piece of information about the MS/MS fragmentation process to more clearly differentiate between correct and random peptide assignments. We demonstrate this functionality using data metrics derived from the properties of individual datasets in a peptide identification context. Using Harvest, we demonstrate how the development of such metrics may improve correct peptide assignment confidence in the context of a high-throughput proteomics experiment and characterise properties of peptide fragmentation. Conclusions Harvest provides a simple framework in C++ for analysing and prototyping metrics for peptide matching, the core of the protein identification problem. It is not a protein identification package and answers a different research question to packages such as Sequest, Mascot, X!Tandem, and other protein identification packages. It does not aim to maximise the number of assigned peptides from a set of unknown spectra, but instead provides a method by which researchers can explore fragmentation properties and assess the power of novel metrics for peptide matching in the context of a given experiment. Metrics developed using Harvest may then become candidates for later integration into protein identification packages.
机译:背景技术使用质谱法进行蛋白质鉴定是生命科学许多领域,尤其是蛋白质组学研究的重要工具。正确鉴定蛋白质的数量的增加取决于能否将有关质谱碎裂过程的新知识纳入计算算法中,该计算算法旨在将肽的真实匹配与虚假匹配分离为未鉴定的质谱。通过计算观察到的光谱与理论光谱之间的潜在匹配的各种特征的函数,以给出它们相似度的数值近似,可以实现这种区分。正是这些潜在的“指标”决定了蛋白质鉴定包装在限制错误发现率的同时最大化正确鉴定的能力。当前没有专门用于简单实现和分析用于肽匹配的任意新指标以及用于探索给定数据集的片段化模式的软件。结果我们展示了Harvest:一个开放源代码的软件工具,用于分析片段化模式并评估有关MS / MS片段化过程的新信息的功能,以更清楚地区分正确的肽段分配和随机的肽段分配。我们使用从肽鉴定上下文中各个数据集的属性派生的数据度量来演示此功能。使用Harvest,我们证明了在高通量蛋白质组学实验的背景下,此类指标的开发如何提高正确的肽分配信心,并表征了肽片段化的特性。结论Harvest是在C ++中提供了一个简单的框架,用于分析和原型化肽匹配的指标,这是蛋白质鉴定问题的核心。它不是蛋白质鉴定软件包,而是对诸如Sequest,Mascot,X!Tandem和其他蛋白质鉴定软件包的软件包回答了不同的研究问题。它的目的不是要从一组未知光谱中最大化分配肽段的数量,而是提供一种方法,研究人员可以通过该方法探索片段化特性,并在给定实验的背景下评估肽段匹配的新指标的功能。使用Harvest开发的度量标准可能会成为以后集成到蛋白质识别包中的候选标准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号