首页> 外文期刊>BMC Genomics >MS2CNN: predicting MS/MS spectrum based on protein sequence using deep convolutional neural networks
【24h】

MS2CNN: predicting MS/MS spectrum based on protein sequence using deep convolutional neural networks

机译:MS2CNN:使用深卷积神经网络预测基于蛋白质序列的MS / MS光谱

获取原文
           

摘要

BACKGROUND:Tandem mass spectrometry allows biologists to identify and quantify protein samples in the form of digested peptide sequences. When performing peptide identification, spectral library search is more sensitive than traditional database search but is limited to peptides that have been previously identified. An accurate tandem mass spectrum prediction tool is thus crucial in expanding the peptide space and increasing the coverage of spectral library search.RESULTS:We propose MS2CNN, a non-linear regression model based on deep convolutional neural networks, a deep learning algorithm. The features for our model are amino acid composition, predicted secondary structure, and physical-chemical features such as isoelectric point, aromaticity, helicity, hydrophobicity, and basicity. MS2CNN was trained with five-fold cross validation on a three-way data split on the large-scale human HCD MS2 dataset of Orbitrap LC-MS/MS downloaded from the National Institute of Standards and Technology. It was then evaluated on a publicly available independent test dataset of human HeLa cell lysate from LC-MS experiments. On average, our model shows better cosine similarity and Pearson correlation coefficient (0.690 and 0.632) than MS2PIP (0.647 and 0.601) and is comparable with pDeep (0.692 and 0.642). Notably, for the more complex MS2 spectra of 3+ peptides, MS2PIP is significantly better than both MS2PIP and pDeep.CONCLUSIONS:We showed that MS2CNN outperforms MS2PIP for 2+ and 3+ peptides and pDeep for 3+ peptides. This implies that MS2CNN, the proposed convolutional neural network model, generates highly accurate MS2 spectra for LC-MS/MS experiments using Orbitrap machines, which can be of great help in protein and peptide identifications. The results suggest that incorporating more data for deep learning model may improve performance.
机译:背景:串联质谱允许生物学家以消化的肽序列的形式鉴定和定量蛋白质样品。当执行肽识别时,光谱库搜索比传统数据库搜索更敏感,但仅限于先前识别的肽。因此,精确的串联质谱预测工具在扩大肽空间并增加光谱库搜索的覆盖范围至关重要。结果:我们提出了一种基于深度卷积神经网络的非线性回归模型的MS2CNN,这是一种深度学习算法。我们模型的特征是氨基酸组成,预测的二级结构和物理化学特征,例如等电点,芳香性,螺旋性,疏水性和碱性。 MS2CNN在从国家标准和技术研究所的orbitrap LC-MS / MS的大规模人HCD MS2数据集上进行了五倍的交叉验证。然后在来自LC-MS实验的人Hela细胞裂解物的公开独立测试数据集上进行评估。平均而言,我们的模型显示出比MS2PIP(0.647和0.601)的更好的余弦相似性和Pearson相关系数(0.690和0.632),与Pdeep(0.692和0.642)相当。值得注意的是,对于3+肽的更复杂的MS2光谱,MS2PIP明显优于MS2PIP和Pdeep.Conclusions:我们表明MS2CNN优于2+和3+肽的MS2PIP和3+肽的PDEEP。这意味着MS2CNN,所提出的卷积神经网络模型,使用锻造机器为LC-MS / MS实验产生高度精确的MS2光谱,这可能具有蛋白质和肽鉴定的巨大帮助。结果表明,纳入更多用于深度学习模型的数据可能会提高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号