首页> 外文学位 >Computational algorithms for spectral prediction and motif discovery in proteomic sequence data.
【24h】

Computational algorithms for spectral prediction and motif discovery in proteomic sequence data.

机译:蛋白质组序列数据中光谱预测和基序发现的计算算法。

获取原文
获取原文并翻译 | 示例

摘要

As the products of the information age continue to permeate into the biological realm, there is an ever-growing need for computational tools to keep pace. Here, two such tools are presented. Although largely unrelated in detail, both have stemmed from a desire to better understand biological sequences through the creation of algorithms harnessing the statistical power contained within large-scale proteomic data sets.;The first of these tools is aimed at the prediction of tandem mass spectral fragment ion intensities with the goal of improving peptide sequencing. Using a large database of confidently assigned doubly charged spectra, data on the two residues surrounding fragmentation sites as well as their relative position by peptide mass was collected. In addition to providing never before visualized trends in tandem mass spectra, results indicate that this information used in conjunction with the outlined spectral prediction methodology is sufficient to model fragment ion intensities with high accuracy considering inherent spectral variability. Furthermore, to assess the likelihood of a sequence/spectral identification, a scoring scheme based on the overlap of high intensity peaks between actual and predicted spectra is described. The SPIIDR (Spectral Prediction of Ion Intensities using DiResidues) algorithm is available for public use at http://gygi.med.harvard.edu/spiidr/.;The second computational tool presented was initially aimed at the discovery of phosphorylation motifs from large-scale phosphoproteomic studies, however, its success at extracting overrepresented patterns from any sequence-based data set, including whole proteins and linguistic text, is also demonstrated in this work. To deconvolute a data set into constitutive motifs, the algorithm uses a dynamic statistical background coupled to an iterative two phase methodology based on recursive motif building and subsequent set reduction. Validation of the approach is exemplified through numerous positive control data sets as well as through the congruity of extracted motifs with those discovered using orthogonal strategies. Furthermore, comparison of the algorithm to other widely used protein motif discovery tools, and its ability to extract previously known biologically-significant motifs, highlight its success. Finally, an in depth overview of the online embodiment of the methodology, known as motif-x, (located at http://motif-x.med.harvard.edu), is provided.
机译:随着信息时代的产品不断渗透到生物领域,对计算工具的需求与日俱增。这里介绍了两个这样的工具。尽管在很大程度上没有详细联系,但两者都源于通过创建利用大规模蛋白质组学数据集中包含的统计能力的算法来更好地理解生物学序列的愿望。;这些工具中的第一个旨在预测串联质谱片段离子强度,旨在改善肽测序。使用一个充满信心地分配了双电荷谱图的大型数据库,收集了片段化位点周围两个残基的数据以及它们相对于肽质量的相对位置。除了提供串联质谱图中从未有过的可视化趋势外,结果还表明,结合固有的光谱预测方法使用的此信息考虑到固有的光谱变异性,足以高精度地模拟碎片离子强度。此外,为了评估序列/光谱识别的可能性,描述了一种基于实际和预测光谱之间高强度峰重叠的评分方案。 SPIIDR(使用DiResidues进行离子强度的光谱预测)算法可在http://gygi.med.harvard.edu/spiidr/上公开使用;提供的第二种计算工具最初旨在发现大分子的磷酸化基序。大规模的磷酸化蛋白质组学研究,但是,它在从任何基于序列的数据集(包括完整蛋白质和语言文字)中提取出过分代表的模式方面的成功也得到了证明。要将数据集反卷积为本构图元,该算法使用动态统计背景,并结合基于递归图元构建和后续集约简的迭代两阶段方法。通过大量的阳性对照数据集以及提取的基序与使用正交策略发现的基序的一致性证明了该方法的有效性。此外,将该算法与其他广泛使用的蛋白质基序发现工具进行比较,以及其提取先前已知的具有生物学意义的基序的能力,凸显了其成功。最后,提供了该方法在线实施方案的深入概述,该方法称为主题-x(位于http://motif-x.med.harvard.edu)。

著录项

  • 作者

    Schwartz, Daniel.;

  • 作者单位

    Harvard University.;

  • 授予单位 Harvard University.;
  • 学科 Molecular biology.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 161 p.
  • 总页数 161
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号