Deep vs. shallow learning-based filters of MS/MS spectra in support of protein search engines

机译：基于深层学习和浅层学习的MS / MS光谱过滤器，可支持蛋白质搜索引擎

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Despite the linear relation between the number of observed spectra and the searching time, the current protein search engines, even the parallel versions, could take several hours to search a large amount of MS/MS spectra, which can be generated in a short time. After a laborious searching process, some (and at times, majority) of the observed spectra are labeled as non-identifiable. We evaluate the role of machine learning in building an efficient MS/MS filter to remove non-identifiable spectra. We compare and evaluate the deep learning algorithm using 9 shallow learning algorithms with different configurations. Using 10 different datasets generated from two different search engines, different instruments, different sizes and from different species, we experimentally show that deep learning models are powerful in filtering MS/MS spectra. We also show that our simple feature list is significant where other shallow learning algorithms showed encouraging results in filtering the MS/MS spectra. Our deep learning model can exclude around 50% of the non-identifiable spectra while losing, on average, only 9% of the identifiable ones. As for shallow learning, algorithms of: Random Forest, Support Vector Machine and Neural Networks showed encouraging results, eliminating, on average, 70% of the non-identifiable spectra while losing around 25% of the identifiable ones. The deep learning algorithm may be especially more useful in instances where the protein(s) of interest are in lower cellular or tissue concentration, while the other algorithms may be more useful for concentrated or more highly expressed proteins.

机译：尽管观察到的质谱图数量与搜索时间之间存在线性关系，但当前的蛋白质搜索引擎，甚至是并行版本，仍可能需要几个小时才能搜索大量的MS / MS质谱图，这些质谱图可以在很短的时间内生成。经过艰苦的搜索过程，一些（有时是大多数）观察到的光谱被标记为无法识别。我们评估了机器学习在构建有效的MS / MS过滤器以消除无法识别的光谱中的作用。我们使用9种具有不同配置的浅层学习算法对深度学习算法进行比较和评估。使用从两个不同的搜索引擎，不同的仪器，不同的大小和不同的物种生成的10个不同的数据集，我们通过实验证明了深度学习模型在过滤MS / MS频谱方面功能强大。我们还表明，在其他浅层学习算法在过滤MS / MS光谱方面显示出令人鼓舞的结果的情况下，我们的简单特征列表非常重要。我们的深度学习模型可以排除大约50％的不可识别频谱，而平均损失仅9％的可识别频谱。对于浅层学习，随机森林，支持向量机和神经网络的算法显示出令人鼓舞的结果，平均消除了70％的不可识别光谱，而损失了约25％的可识别光谱。在感兴趣的蛋白质处于较低细胞或组织浓度的情况下，深度学习算法可能特别有用，而其他算法对于浓缩或表达更高的蛋白质可能更有用。

著录项

来源
《IEEE International Conference on Bioinformatics and Biomedicine》|2017年|1175-1182|共8页
会议地点
作者
Majdi Maabreh; Basheer Qolomany; James Springstead; Izzat Alsmadi; Ajay Gupta;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Search engines; Machine learning algorithms; Proteins; Filtering; Machine learning; Databases; Filtering algorithms;

机译：搜索引擎;机器学习算法;蛋白质;过滤;机器学习;数据库;过滤算法;

相似文献

外文文献
中文文献
专利

1. Identification and characterization of disulfide bonds in proteins and peptides from tandem MS data by use of the MassMatrix MS/MS search engine [J] . Xu H, Zhang LW, Freitas MA Journal of proteome research . 2008,第1期

机译：通过使用MassMatrix MS / MS搜索引擎从串联MS数据中鉴定和表征蛋白质和肽中的二硫键
2. PTMTreeSearch: a novel two-stage tree-search algorithm with pruning rules for the identification of post-translational modification of proteins in MS/MS spectra [J] . Kertesz-Farkas Attila, Reiz Beata, Vera Roberto, Bioinformatics . 2014,第2期

机译：PTMTreeSearch：具有修剪规则的新型两阶段树搜索算法，用于识别MS / MS光谱中蛋白质的翻译后修饰
3. STEPS: A grid search methodology for optimized peptide identification filtering of MS/MS database search results [J] . PiehowskiP.D., PetyukV.A., SandovalJ.D., Proteomics . 2013,第5期

机译：步骤：优化MS / MS数据库搜索结果的肽段识别过滤的网格搜索方法
4. Deep vs. Shallow Learning-based Filters of MS/MS Spectra in Support of Protein Search Engines [C] . Majdi Maabreh, Basheer Qolomany, James Springstead, IEEE International Conference on Bioinformatics and Biomedicine . 2017

机译：基于浅学习的MS / MS Spectra过滤器，支持蛋白质搜索引擎
5. Applications of Probabilistic Models on Peptide MS/MS Spectra Identification and Protein Quantification. [D] . Ma, Chun Wai. 2014

机译：概率模型在肽MS / MS光谱鉴定和蛋白质定量中的应用。
6. Identification and Characterization of Disulfide Bonds in Proteins and Peptides from Tandem MS Data by Use of the MassMatrix MS/MS Search Engine [O] . Hua Xu, Liwen Zhang, Michael A. Freitas -1

机译：通过使用MassMatrix MS / MS搜索引擎从串联MS数据中鉴定和表征蛋白质和多肽中的二硫键
7. Identification and Characterization of Disulfide Bonds in Proteins and Peptides from Tandem MS Data by Use of the MassMatrix MS/MS Search Engine [O] . Hua Xu, Liwen Zhang, Michael A. Freitas 2008

机译：通过使用Massmatrix MS / MS搜索引擎鉴定和表征蛋白质和肽中的二硫键和肽中的二硫键

Deep vs. shallow learning-based filters of MS/MS spectra in support of protein search engines

摘要

著录项

相似文献

相关主题

期刊订阅