首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >Deep vs. shallow learning-based filters of MS/MS spectra in support of protein search engines
【24h】

Deep vs. shallow learning-based filters of MS/MS spectra in support of protein search engines

机译:基于深层学习和浅层学习的MS / MS光谱过滤器,可支持蛋白质搜索引擎

获取原文

摘要

Despite the linear relation between the number of observed spectra and the searching time, the current protein search engines, even the parallel versions, could take several hours to search a large amount of MS/MS spectra, which can be generated in a short time. After a laborious searching process, some (and at times, majority) of the observed spectra are labeled as non-identifiable. We evaluate the role of machine learning in building an efficient MS/MS filter to remove non-identifiable spectra. We compare and evaluate the deep learning algorithm using 9 shallow learning algorithms with different configurations. Using 10 different datasets generated from two different search engines, different instruments, different sizes and from different species, we experimentally show that deep learning models are powerful in filtering MS/MS spectra. We also show that our simple feature list is significant where other shallow learning algorithms showed encouraging results in filtering the MS/MS spectra. Our deep learning model can exclude around 50% of the non-identifiable spectra while losing, on average, only 9% of the identifiable ones. As for shallow learning, algorithms of: Random Forest, Support Vector Machine and Neural Networks showed encouraging results, eliminating, on average, 70% of the non-identifiable spectra while losing around 25% of the identifiable ones. The deep learning algorithm may be especially more useful in instances where the protein(s) of interest are in lower cellular or tissue concentration, while the other algorithms may be more useful for concentrated or more highly expressed proteins.
机译:尽管观察到的质谱图数量与搜索时间之间存在线性关系,但当前的蛋白质搜索引擎,甚至是并行版本,仍可能需要几个小时才能搜索大量的MS / MS质谱图,这些质谱图可以在很短的时间内生成。经过艰苦的搜索过程,一些(有时是大多数)观察到的光谱被标记为无法识别。我们评估了机器学习在构建有效的MS / MS过滤器以消除无法识别的光谱中的作用。我们使用9种具有不同配置的浅层学习算法对深度学习算法进行比较和评估。使用从两个不同的搜索引擎,不同的仪器,不同的大小和不同的物种生成的10个不同的数据集,我们通过实验证明了深度学习模型在过滤MS / MS频谱方面功能强大。我们还表明,在其他浅层学习算法在过滤MS / MS光谱方面显示出令人鼓舞的结果的情况下,我们的简单特征列表非常重要。我们的深度学习模型可以排除大约50%的不可识别频谱,而平均损失仅9%的可识别频谱。对于浅层学习,随机森林,支持向量机和神经网络的算法显示出令人鼓舞的结果,平均消除了70%的不可识别光谱,而损失了约25%的可识别光谱。在感兴趣的蛋白质处于较低细胞或组织浓度的情况下,深度学习算法可能特别有用,而其他算法对于浓缩或表达更高的蛋白质可能更有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号