首页> 外文期刊>Bioinformatics >The necessity of adjusting tests of protein category enrichment in discovery proteomics
【24h】

The necessity of adjusting tests of protein category enrichment in discovery proteomics

机译:在发现蛋白质组学中调整蛋白质类别富集测试的必要性

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Enrichment tests are used in high-throughput experimentation to measure the association between gene or protein expression and membership in groups or pathways. The Fisher's exact test is commonly used. We specifically examined the associations produced by the Fisher test between protein identification by mass spectrometry discovery proteomics, and their Gene Ontology ( GO) term assignments in a large yeast dataset. We found that direct application of the Fisher test is misleading in proteomics due to the bias in mass spectrometry to preferentially identify proteins based on their biochemical properties. False inference about associations can be made if this bias is not corrected. Our method adjusts Fisher tests for these biases and produces associations more directly attributable to protein expression rather than experimental bias.Results: Using logistic regression, we modeled the association between protein identification and GO term assignments while adjusting for identification bias in mass spectrometry. The model accounts for five biochemical properties of peptides: (i) hydrophobicity, (ii) molecular weight, (iii) transfer energy, (iv) beta turn frequency and (v) isoelectric point. The model was fit on 181 060 peptides from 2678 proteins identified in 24 yeast proteomics datasets with a 1% false discovery rate. In analyzing the association between protein identification and their GO term assignments, we found that 25% (134 out of 544) of Fisher tests that showed significant association (q-value < 0.05) were non-significant after adjustment using our model. Simulations generating yeast protein sets enriched for identification propensity show that unadjusted enrichment tests were biased while our approach worked well.
机译:动机:富集测试用于高通量实验中,以测量基因或蛋白质表达与群体或途径成员之间的关联。通常使用Fisher精确检验。我们专门检查了由Fisher检验在质谱发现蛋白质组学鉴定蛋白质与大型酵母数据集中它们的基因本体论(GO)术语分配之间产生的关联。我们发现,由于质谱分析偏向于根据其生化特性优先识别蛋白质,因此直接将Fisher检验应用于蛋白质组学是一种误导。如果不纠正这种偏差,就可能做出关于关联的错误推断。我们的方法针对这些偏倚调整了Fisher检验,并产生了更直接地归因于蛋白质表达而不是实验偏倚的结果。结果:使用逻辑回归,我们在校正质谱中的识别偏倚的同时对蛋白质识别与GO术语分配之间的关联进行了建模。该模型说明了肽的五种生化特性:(i)疏水性,(ii)分子量,(iii)转移能,(iv)β转折频率和(v)等电点。该模型适用于24种酵母蛋白质组学数据集中鉴定的来自2678种蛋白质的181060种肽,错误发现率为1%。在分析蛋白质鉴定与其GO术语分配之间的关联时,我们发现使用我们的模型进行调整后,显示出显着关联(q值<0.05)的Fisher检验中有25%(544个中的134个)不显着。生成丰富的用于识别倾向的酵母蛋白质集的模拟表明,虽然我们的方法运行良好,但未经调整的富集测试存在偏差。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号