...
首页> 外文期刊>Journal of proteome research >Assessment and improvement of statistical tools for comparative proteomics analysis of sparse data sets with few experimental replicates
【24h】

Assessment and improvement of statistical tools for comparative proteomics analysis of sparse data sets with few experimental replicates

机译:评估和改进统计工具,用于比较蛋白质组学的稀疏数据集分析,几乎没有实验重复

获取原文
获取原文并翻译 | 示例

摘要

Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling and mass spectrometry experiments and should be applicable to large data sets of any type. An R script that implements the improved rank products algorithm and the combined analysis is available.
机译:生物系统的大规模定量分析通常只需要很少的重复实验就可以完成,由于缺少值而导致多个不相同的数据集。例如,由于样品稀少或由于占空比或灵敏度的限制,或者可用仪器的能力有限,质谱驱动的蛋白质组学实验通常很少进行生物学或技术上的重复,从而导致检测出重大特征变化的结果不完整成为挑战。为了检测肽水平上的显着变化,例如在磷酸蛋白质组学实验中,这个问题进一步恶化。为了评估此问题的程度以及对大规模蛋白质组分析的影响,我们使用具有不同数量缺失值的模拟和实验数据集,研究并优化了三种统计方法的性能。我们应用了三种工具,包括标准t检验,中度t检验(也称为limma)和用于对模拟和实验蛋白质组学数据集中缺失值的显着变化特征进行检测的乘积。改进了乘积法,以处理包含缺失值的数据集。对模拟和实验数据集的广泛分析表明,统计分析工具的性能取决于数据集的简单属性。通过使用limma和rank乘积方法对一式三份的数据集进行分析获得了高可信度的结果,这些数据集显示了1000多个特征,而缺失值超过50%。通过使用limma和rank product方法以互补的方式确定了最大数量的差异表示特征。因此,我们建议将这些方法结合使用,作为一种新颖且最佳的方法来检测这些数据集中显着变化的特征。此方法适用于来自稳定同位素标记和质谱实验的大量定量数据集,并且应适用于任何类型的大型数据集。提供了一个R脚本,用于实现改进的秩乘算法和组合分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号