首页> 外文学位 >Mass spectrometry-based proteomic data analysis.
【24h】

Mass spectrometry-based proteomic data analysis.

机译:基于质谱的蛋白质组数据分析。

获取原文
获取原文并翻译 | 示例

摘要

Proteomics studies large-scale cellular functions directly at the protein level. In proteomics, mass spectrometry (MS) has been a primary tool in conducting high-throughput experiments. In a typical shotgun proteomic experiment, proteins are digested into peptides by enzymes and analyzed by a mass spectrometer. A complete liquid-chromatogram mass spectrometry (LC-MS) dataset contains thousands of single stage spectra (MS1) and tandem MS spectra (MS2), which correspond to ionized peptides and their fragments, respectively. Qualitative and quantitative analysis of proteins from LC-MS data in an accurate and high-throughput manner are primary goals of proteomics.;In a proteomic data analysis framework, there are many intermediate steps. According to their objectives, they can be categorized into three major steps: preprocessing, peptide-level analysis and protein-level analysis. This thesis has made the following contributions in these three steps.;In the preprocessing step, we provide a survey and compare the performance of single spectrum-based peak detection methods. In general, we can decompose a peak detection procedure into three consequent parts: smoothing, baseline correction and peak finding. We first categorize existing peak detection algorithms according to the techniques used in different phases. Such a categorization reveals the differences and similarities among existing peak detection algorithms. Then, we choose five typical peak detection algorithms to conduct a comprehensive experimental study using both simulation data and real matrix-assisted laser desorption/ionization (MALDI) MS data. According to our study, the continuous wavelet transform-based method is the most effective one in practice.;In the peptide-level analysis step, we develop convex optimization models to perform peptide identification and peptide quantification. For peptide identification, we propose a new method named MIRanker. It uses information in the protein database and MS1 spectra to improve peptide identification results. According to our experiments on a standard protein mixture dataset, a human dataset and a mouse dataset, MIRanker achieves better peptide re-ranking results than existing methods including PetideProphet, PeptideProphet plus the number of sibling peptides and a score regularization method SRPI. For peptide quantification, we propose to estimate peptide abundance by taking advantage of peptide isotopic distribution and smoothness of peptide elution profile. Our method solves the peptide overlapping problem and provides a way to control the variance of estimation.;In the protein-level analysis step, we develop a new protein identification method. It provides a combinatorial perspective of the protein inference problem by calculating the conditional protein probabilities (Protein probability means the probability that a protein is correctly identified) under three assumptions, which lead to a lower bound, an upper bound and an empirical estimation of protein probabilities, respectively. The combinatorial perspective enables us to obtain an analytical expression for protein inference. We also study the relationship between our model and other methods such as one-hit rule, greedy algorithms, and the-state-of-the-art method ProteinProphet. The proposed method can achieve better results than ProteinProphet in a much more efficient manner.
机译:蛋白质组学直接在蛋白质水平上研究大规模细胞功能。在蛋白质组学中,质谱(MS)已成为进行高通量实验的主要工具。在典型的shot弹枪蛋白质组实验中,蛋白质被酶消化成肽,并通过质谱仪进行分析。完整的液相色谱质谱(LC-MS)数据集包含成千上万的单级谱(MS1)和串联MS谱(MS2),分别对应于离子化肽及其片段。以准确,高通量的方式对LC-MS数据中的蛋白质进行定性和定量分析是蛋白质组学的主要目标。在蛋白质组学数据分析框架中,存在许多中间步骤。根据他们的目标,它们可以分为三个主要步骤:预处理,肽水平分析和蛋白质水平分析。本文在这三个步骤中做出了以下贡献。在预处理步骤中,我们进行了调查并比较了基于单个光谱的峰检测方法的性能。通常,我们可以将峰检测过程分解为三个后续部分:平滑,基线校正和峰发现。我们首先根据不同阶段中使用的技术对现有的峰值检测算法进行分类。这种分类揭示了现有峰值检测算法之间的差异和相似性。然后,我们选择五种典型的峰检测算法,以利用模拟数据和真实的基质辅助激光解吸/电离(MALDI)MS数据进行全面的实验研究。根据我们的研究,基于连续小波变换的方法在实践中是最有效的方法。在肽水平分析步骤中,我们开发了凸优化模型来进行肽鉴定和定量。对于肽鉴定,我们提出了一种称为MIRanker的新方法。它使用蛋白质数据库和MS1光谱中的信息来改善肽段鉴定结果。根据我们在标准蛋白质混合物数据集,人类数据集和小鼠数据集上的实验,与现有的方法(包括PetideProphet,PeptideProphet加上同级肽的数量以及分数正则化方法SRPI)相比,MIRanker获得了更好的肽段重新排名结果。对于肽定量,我们建议利用肽同位素分布和肽洗脱图谱的平滑度来估算肽丰度。我们的方法解决了肽段重叠的问题,并提供了一种控制估计方差的方法。在蛋白质水平分析步骤中,我们开发了一种新的蛋白质鉴定方法。通过在三个假设下计算条件蛋白质概率(蛋白质概率是指正确识别蛋白质的概率)来提供蛋白质推断问题的组合视角,这将导致蛋白质概率的下限,上限和经验估计, 分别。组合的观点使我们能够获得蛋白质推断的分析表达。我们还研究了模型与其他方法之间的关系,例如一击法则,贪婪算法和最新方法ProteinProphet。与ProteinProphet相比,该方法可以更有效地获得更好的结果。

著录项

  • 作者

    Yang, Chao.;

  • 作者单位

    Hong Kong University of Science and Technology (Hong Kong).;

  • 授予单位 Hong Kong University of Science and Technology (Hong Kong).;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 130 p.
  • 总页数 130
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号