...
首页> 外文期刊>Proteome science >An unsupervised machine learning method for assessing quality of tandem mass spectra
【24h】

An unsupervised machine learning method for assessing quality of tandem mass spectra

机译:评估串联质谱质量的无监督机器学习方法

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background In a single proteomic project, tandem mass spectrometers can produce hundreds of millions of tandem mass spectra. However, majority of tandem mass spectra are of poor quality, it wastes time to search them for peptides. Therefore, the quality assessment (before database search) is very useful in the pipeline of protein identification via tandem mass spectra, especially on the reduction of searching time and the decrease of false identifications. Most existing methods for quality assessment are supervised machine learning methods based on a number of features which describe the quality of tandem mass spectra. These methods need the training datasets with knowing the quality of all spectra, which are usually unavailable for the new datasets. Results This study proposes an unsupervised machine learning method for quality assessment of tandem mass spectra without any training dataset. This proposed method estimates the conditional probabilities of spectra being high quality from the quality assessments based on individual features. The probabilities are estimated through a constraint optimization problem. An efficient algorithm is developed to solve the constraint optimization problem and is proved to be convergent. Experimental results on two datasets illustrate that if we search only tandem spectra with the high quality determined by the proposed method, we can save about 56 % and 62% of database searching time while losing only a small amount of high-quality spectra. Conclusions Results indicate that the proposed method has a good performance for the quality assessment of tandem mass spectra and the way we estimate the conditional probabilities is effective.
机译:背景技术在单个蛋白质组学项目中,串联质谱仪可以产生数亿个串联质谱图。但是,大多数串联质谱的质量较差,因此浪费时间来搜索肽。因此,质量评估(数据库搜索之前)在通过串联质谱鉴定蛋白质的过程中非常有用,尤其是在减少搜索时间和减少错误鉴定方面。大多数现有的质量评估方法都是基于描述串联质谱质量的许多功能的有监督的机器学习方法。这些方法需要知道所有光谱质量的训练数据集,而这些通常对于新数据集是不可用的。结果本研究提出了一种无需监督的机器学习方法,无需任何训练数据集即可进行串联质谱的质量评估。该提出的方法根据基于单个特征的质量评估来估计高质量的光谱的条件概率。通过约束优化问题来估计概率。提出了一种有效的算法来解决约束优化问题,并证明了算法的收敛性。在两个数据集上的实验结果表明,如果只搜索所提出方法确定的高质量的串联质谱图,则可以节省大约56%和62%的数据库搜索时间,同时仅损失少量的高质量质谱图。结论结果表明,该方法对串联质谱的质量评估具有良好的性能,并且我们估计条件概率的方法是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号