首页> 美国卫生研究院文献>Proceedings of the National Academy of Sciences of the United States of America >Predicting interpretability of metabolome models based on behavior putative identity and biological relevance of explanatory signals
【2h】

Predicting interpretability of metabolome models based on behavior putative identity and biological relevance of explanatory signals

机译:根据行为推定身份和解释信号的生物学相关性预测代谢组模型的可解释性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Powerful algorithms are required to deal with the dimensionality of metabolomics data. Although many achieve high classification accuracy, the models they generate have limited value unless it can be demonstrated that they are reproducible and statistically relevant to the biological problem under investigation. Random forest (RF) generates models, without any requirement for dimensionality reduction or feature selection, in which individual variables are ranked for significance and displayed in an explicit manner. In metabolome fingerprinting by mass spectrometry, each metabolite can be represented by signals at several m/z. Exploiting a prior understanding of expected biochemical differences between sample classes, we aimed to develop meaningful metrics relevant to the significance both of the overall RF model and individual, potentially explanatory, signals. Pair-wise comparison of related plant genotypes with strong phenotypic differences demonstrated that robust models are not only reproducible but also logically structured, highlighting correlated m/z derived from just a small number of explanatory metabolites reflecting the biological differences between sample classes. RF models were also generated by using groupings of samples known to be increasingly phenotypically similar. Although classification accuracy was often reasonable, we demonstrated reproducibly in both Arabidopsis and potato a performance threshold based on margin statistics beyond which such models showed little structure indicative of either generalizibility or further biological interpretability. In a multiclass problem using 25 Arabidopsis genotypes, despite the complicating effects of ecotype background and secondary metabolome perturbations common to several mutations, the ranking of metabolome signals by RF provided scope for deeper interpretability.
机译:需要强大的算法来处理代谢组学数据的维数。尽管许多模型实现了很高的分类精度,但是除非可以证明它们可重现且与所研究的生物学问题在统计上相关,否则它们生成的模型的价值有限。随机森林(RF)生成模型,而无需进行降维或特征选择,在该模型中,对各个变量进行了重要性排序并以显式方式进行显示。在通过质谱进行的代谢组指纹分析中,每种代谢物都可以用几m / z的信号表示。通过对样本类别之间预期的生化差异的事先了解,我们旨在开发与整体RF模型和单个,潜在的解释性信号的重要性相关的有意义的指标。具有强表型差异的相关植物基因型的成对比较表明,健壮的模型不仅可重现,而且逻辑结构合理,突出了仅反映了少量解释性代谢物的相关m / z,反映了样品类别之间的生物学差异。还通过使用已知在表型上越来越相似的样本分组来生成RF模型。尽管分类精度通常是合理的,但我们在拟南芥和马铃薯中均基于边际统计数据可再现地证明了性能阈值,超过该阈值时,此类模型显示的结构很少表明可概括性或进一步的生物学解释性。在使用25种拟南芥基因型的多类问题中,尽管生态型背景和几种突变所共有的次级代谢组扰动的影响复杂化,但RF对代谢组信号的排名仍提供了更深层的解释范围。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号