首页> 外文会议>2012 IEEE International Conference on Bioinformatics and Biomedicine. >Manifold learning reveals nonlinear structure in metagenomic profiles
【24h】

Manifold learning reveals nonlinear structure in metagenomic profiles

机译:流形学习揭示了宏基因组图谱中的非线性结构

获取原文
获取原文并翻译 | 示例

摘要

Using metagenomics to detect the global structure of microbial community remains a significant challenge. The structure of a microbial community and its functions are complicated not only because of the complex interactions among microbes but also their complicate interacting with confounding environmental factors. Recently dimension reduction methods such as Principle component analysis, Non-negative matrix factorization and Canonical correlation analysis have been employed extensively to investigate the complex structure embedded in metagenomic profiles which summarize the abundance of functional or taxonomic categorizations in metagenomic studies. However, metagenomic profiles are not necessary to meet the "Assumption of Linearity" behind these methods. Therefore it is worth to investigate how nonlinear methods can be utilized in metagenomic studies. In this paper, a nonlinear manifold learning method- Isomap is used to visualize and analyze large-scale metagenomic profiles. Isomap was applied on a large-scale Pfam profile which are derived from 45 metagenomes in Global Ocean Sampling expedition. In our result, a novel nonlinear structure of protein families is identified and the relationships among the identified nonlinear components and environmental factors of global ocean are explored. The results indicate the strength of nonlinear methods in learning the complex microbial structure. With the coming of the huge number of new sequenced metagenomes, nonlinear methods like Isomap could be necessary complementary tools to current widely used methods.
机译:使用宏基因组学来检测微生物群落的整体结构仍然是一个重大挑战。微生物群落的结构及其功能是复杂的,这不仅是由于微生物之间复杂的相互作用,而且还由于它们与复杂的环境因素相互作用而变得复杂。最近,诸如主成分分析,非负矩阵分解和规范相关分析之类的降维方法已被广泛用于研究宏基因组图谱中嵌入的复杂结构,这些结构总结了宏基因组学研究中功能或分类学分类的丰富性。但是,宏基因组配置文件不是满足这些方法背后的“线性假设”所必需的。因此,有必要研究如何在宏基因组学研究中利用非线性方法。在本文中,一种非线性流形学习方法-Isomap用于可视化和分析大规模宏基因组图。将等值图应用于大规模Pfam配置文件,该配置文件来自全球海洋采样考察中的45个基因组。在我们的结果中,确定了一种新型的蛋白质家族非线性结构,并探讨了已鉴定的非线性成分与全球海洋环境因素之间的关系。结果表明非线性方法在学习复杂微生物结构方面的优势。随着大量新的测序基因组的出现,像Isomap这样的非线性方法可能成为当前广泛使用的方法的必要补充工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号