Machine Learning Strategy That Leverages Large Data sets to Boost Statistical Power in Small-Scale Experiments

Fondrie William E.; Noble William S.

首页> 外文期刊>Journal of proteome research >Machine Learning Strategy That Leverages Large Data sets to Boost Statistical Power in Small-Scale Experiments

【24h】

Machine Learning Strategy That Leverages Large Data sets to Boost Statistical Power in Small-Scale Experiments

机译：机器学习策略利用大型数据集来提高小规模实验中的统计功率

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

y Machine learning methods have proven invaluable for increasing the sensitivity of peptide detection in proteomics experiments. Most modern tools, such as Percolator and PeptideProphet, use semisupervised algorithms to learn models directly from the data sets that they analyze. Although these methods are effective for many proteomics experiments, we suspected that they may be suboptimal for experiments of smaller scale. In this work, we found that the power and consistency of Percolator results were reduced as the size of the experiment was decreased. As an alternative, we propose a different operating mode for Percolator: learn a model with Percolator from a large data set and use the learned model to evaluate the small-scale experiment. We call this a "static modeling" approach, in contrast to Percolator's usual "dynamic model" that is trained anew for each data set. We applied this static modeling approach to two settings: small, gel-based experiments and single-cell proteomics. In both cases, static models increased the yield of detected peptides and eliminated the model-induced variability of the standard dynamic approach. These results suggest that static models are a powerful tool for bringing the full benefits of Percolator and other semisupervised algorithms to small-scale experiments.

机译：y机床学习方法已经证明，用于提高蛋白质组学实验中肽检测的敏感性的无价值。大多数现代工具，如渗滤器和肽前容，使用半体验算法直接从他们分析的数据集中学习模型。虽然这些方法对许多蛋白质组学实验有效，但我们怀疑它们可能是较小规模的实验的次优。在这项工作中，我们发现，随着实验的大小降低，渗透结果的功率和一致性降低。作为替代方案，我们提出了一种不同的运算模式来实现渗滤器：从大数据集中使用渗滤器学习模型，并使用学习模型来评估小规模实验。我们称之为“静态建模”方法，与培训每次数据集培训的培训的常规“动态模型”相反。我们将这种静态建模方法应用于两个设置：小型，凝胶的实验和单细胞蛋白质组学。在这两种情况下，静态模型增加了检测到的肽的产量，并消除了标准动态方法的模型诱导的可变性。这些结果表明，静态模型是一种强大的工具，用于将渗滤器和其他半体验算法带到小规模实验的强大工具。

著录项

来源
《Journal of proteome research》 |2020年第3期|共8页
作者
Fondrie William E.; Noble William S.;
展开▼
作者单位

Univ Washington Dept Genome Sci Seattle WA 98195 USA;

Univ Washington Dept Genome Sci Seattle WA 98195 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类分子生物学;蛋白质;
关键词
tandem mass spectrometry; machine learning; support vector machine; SVM; percolator; single-cell mass spectrometry; proteomics; confidence estimation; bioinformatics; peptide identification;

机译：串联质谱;机器学习;支持向量机;SVM;渗滤器;单细胞质谱;蛋白质组学;置信度估计;生物信息学;肽鉴定;

相似文献

外文文献
中文文献
专利

1. Machine Learning Strategy That Leverages Large Data sets to Boost Statistical Power in Small-Scale Experiments [J] . Fondrie William E., Noble William S. Journal of proteome research . 2020,第3期

机译：机器学习策略利用大型数据集来提高小规模实验中的统计功率
2. Large data sets and machine learning: Applications to statistical arbitrage [J] . Huck Nicolas European Journal of Operational Research . 2019,第1期

机译：大数据集和机器学习：统计套利的应用程序
3. Adaptive power management strategy-based optimization and estimation of a renewable energy storage system in stand-alone microgrid with machine learning and data monitoring [J] . Sathishkumar D., Karthikeyan C. International Journal of Wavelets, Multiresolution and Information Processing . 2020,第1期

机译：基于自适应电源管理战略的优化和估计可再生能源存储系统的独立微电网，具有机器学习和数据监控
4. Application of Statistical Machine Learning Algorithms for Classification of Bridge Deformation Data Sets [C] . Juan C. Avendano, Luis Daniel Otero, Carlos Otero IEEE International Systems Conference . 2021

机译：统计机器学习算法在桥梁变形数据集分类中的应用
5. Statistical Machine Learning for Complex Data Sets [D] . Dai, Xiaowu. 2019

机译：复杂数据集的统计机器学习
6. Boosting the Power of Schizophrenia Genetics by Leveraging New Statistical Tools [O] . Ole A. Andreassen, Wesley K. Thompson, Anders M. Dale 2014

机译：利用新的统计工具增强精神分裂症遗传学的力量
7. Machine Learning Strategy That Leverages Large Data sets to Boost Statistical Power in Small-Scale Experiments [O] . William E. Fondrie, William S. Noble 2020

机译：机器学习策略利用大数据集来提高小规模实验中的统计功率
8. Machine Learning Algorithms for Statistical Patterns in Large Data Sets. [R] . Dubrawski, A. 2018

机译：大数据集统计模式的机器学习算法。

Machine Learning Strategy That Leverages Large Data sets to Boost Statistical Power in Small-Scale Experiments

摘要

著录项

相似文献

相关主题

期刊订阅