首页> 外文期刊>Analytical chemistry >Chemical Class Prediction of Unknown Biomolecules Using Ion Mobility-Mass Spectrometry and Machine Learning: Supervised Inference of Feature Taxonomy from Ensemble Randomization
【24h】

Chemical Class Prediction of Unknown Biomolecules Using Ion Mobility-Mass Spectrometry and Machine Learning: Supervised Inference of Feature Taxonomy from Ensemble Randomization

机译:使用离子迁移率 - 质谱和机器学习的未知生物分子的化学类预测:来自集合随机化的特征分类的监督推断

获取原文
获取原文并翻译 | 示例
       

摘要

This work presents a machine learning algorithm referred to as the supervised inference of feature taxonomy from ensemble randomization (SIFTER), which supports the identification of features derived from untargeted ion mobility-mass spectrometry (IM-MS) experiments. SIFTER utilizes random forest machine learning on three analytical measurements derived from IM-MS (collision cross section, CCS), mass-to-charge (m/z), and mass defect (Delta m) to classify unknown features into a taxonomy of chemical kingdom, super class, class, and subclass. Each of these classifications is assigned a calculated probability as well as alternate classifications with associated probabilities. After optimization, SIFTER was tested against a set of molecules not used in the training set. The average success rate in classifying all four taxonomy categories correctly was found to be >99%. Analysis of molecular features detected from a complex biological matrix and not used in the training set yielded a lower success rate where all four categories were correctly predicted for similar to 80% of the compounds. This decline in performance is in part due to incompleteness of the training set across all potential taxonomic categories, but also resulting from a nearest-neighbor bias in the random forest algorithm. Ongoing efforts are focused on improving the class prediction accuracy of SIFTER through expansion of empirical data sets used for training as well as improvements to the core algorithm.
机译:该工作介绍了从集合随机化(SIFER)的特征分类机制的机器学习算法,其支持识别来自未确定离子迁移率质谱(IM-MS)实验的特征。 Sifter在从IM-MS(碰撞横截面,CCS),质量收费(M / Z)和质量缺陷(Delta M)中得出的三种分析测量中的随机林机器学习,将未知特征分类为化学的分类王国,超级阶级,班级和子类。这些分类中的每一个都分配了计算的概率以及具有相关概率的替代分类。优化后,对训练集中不使用的一组分子测试SiFter。发现所有四个分类类别正确分类的平均成功率被发现> 99%。从复杂生物基质中检测到的分子特征,并且在训练组中不使用的分子特征产生了较低的成功率,其中所有四种类别被正确预测到类似于80%的化合物。这种性能下降部分是由于所有潜在的分类分类类别的培训都有不完整,而且由随机森林算法中的最近邻偏见产生。通过扩展用于培训的经验数据集以及核心算法的改进,持续努力提高筛分器的阶级预测准确性。

著录项

  • 来源
    《Analytical chemistry》 |2020年第15期|共9页
  • 作者单位

    Vanderbilt Univ Ctr Innovat Technol Vanderbilt Ingram Canc Ctr Vanderbilt Inst Chem Biol Dept Chem Vanderbilt In 221 Kirkland Hall Nashville TN 37235 USA;

    Vanderbilt Univ Ctr Innovat Technol Vanderbilt Ingram Canc Ctr Vanderbilt Inst Chem Biol Dept Chem Vanderbilt In 221 Kirkland Hall Nashville TN 37235 USA;

    Vanderbilt Univ Ctr Innovat Technol Vanderbilt Ingram Canc Ctr Vanderbilt Inst Chem Biol Dept Chem Vanderbilt In 221 Kirkland Hall Nashville TN 37235 USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 分析化学;
  • 关键词

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号