首页> 外文期刊>RSC Advances >A matching algorithm with isotope distribution pattern in LC-MS based on support vector machine (SVM) learning model
【24h】

A matching algorithm with isotope distribution pattern in LC-MS based on support vector machine (SVM) learning model

机译:基于支持向量机(SVM)学习模型的LC-MS中具有同位素分布模式的匹配算法

获取原文
获取原文并翻译 | 示例
       

摘要

In proteomics, it is important to detect, analyze, and quantify complex peptide components and differences. The key is to match the elution time peaks (LC peaks) produced by the same peptide in replicate experiments. Warping functions are currently widely used to correct the mean of time shifts among replicates. However, they cannot reduce the ambiguity to distinguish the corresponding peak pairs and the non-corresponding ones because the time shifts are random based on each extracted-ion-chromatogram (XIC). In this paper, besides time feature, isotope distribution pattern similarity is considered. The novelty is that compared with other feature based methods including the isotope feature, the algorithm is not based on the peak profile similarity as usual, but on the isotope distribution similarity. First, the training set of peptides including the corresponding and non-corresponding peak pairs were selected from the MS/MS results. Second, we generated time difference and isotope distribution pattern similarities for each peak pair. Third, Support Vector Machine (SVM) classification was used based on the two features. Finally, the accuracy was measured along with final coverage. We first used a 10-fold cross validation to test the effectiveness of the SVM learning model. The accuracy of correct matching could reach 97%. Second, we evaluated the coverage based on the learning model, which could be from 75% to 91% in different datasets. Thus, this matching algorithm based on time and isotope distribution pattern features could provide a high accuracy and coverage for the corresponding peak identification.
机译:在蛋白质组学中,重要的是检测,分析和量化复杂的肽组分和差异。关键是匹配在复制实验中由相同肽产生的洗脱时间峰(LC峰)。弯曲函数是目前广泛使用的纠正重复间的时间偏移的平均值。然而,它们不能降低歧义以区分相应的峰值对和非相应的模糊性,因为时间偏移是基于每个提取的离子色谱图(XIC)。本文除了时间特征之外,考虑了同位素分布模式相似性。该新颖性是与包括同位素特征在内的其他特征的方法进行比较,该算法不是基于常见的峰值轮廓相似性,而是对同位素分布相似性。首先,从MS / MS结果中选择包括相应和非相应峰对的培训肽组。其次,我们为每个峰对产生的时间差和同位素分布模式相似度。第三,基于两个功能使用支持向量机(SVM)分类。最后,测量了准确度以及最终覆盖率。我们首先使用10倍的交叉验证来测试SVM学习模型的有效性。正确匹配的准确性可以达到97%。其次,我们基于学习模型评估了覆盖范围,其在不同的数据集中可以从75%到91%。因此,基于时间和同位素分布模式特征的这种匹配算法可以为相应的峰值识别提供高精度和覆盖。

著录项

  • 来源
    《RSC Advances》 |2019年第48期|共9页
  • 作者

  • 作者单位
  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 化学;
  • 关键词

  • 入库时间 2022-08-19 17:46:24

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号