首页> 外文期刊>Journal of Mathematical Biology >A non-negative matrix factorization framework for identifying modular patterns in metagenomic profile data
【24h】

A non-negative matrix factorization framework for identifying modular patterns in metagenomic profile data

机译:非负矩阵分解框架,用于识别宏基因组配置文件数据中的模块化模式

获取原文
获取原文并翻译 | 示例
           

摘要

Metagenomic studies sequence DNA directly from environmental samples to explore the structure and function of complex microbial and viral communities. Individual, short pieces of sequenced DNA (“reads”) are classified into (putative) taxonomic or metabolic groups which are analyzed for patterns across samples. Analysis of such read matrices is at the core of using metagenomic data to make inferences about ecosystem structure and function. Non-negative matrix factorization (NMF) is a numerical technique for approximating high-dimensional data points as positive linear combinations of positive components. It is thus well suited to interpretation of observed samples as combinations of different components. We develop, test and apply an NMF-based framework to analyze metagenomic read matrices. In particular, we introduce a method for choosing NMF degree in the presence of overlap, and apply spectral-reordering techniques to NMF-based similarity matrices to aid visualization. We show that our method can robustly identify the appropriate degree and disentangle overlapping contributions using synthetic data sets. We then examine and discuss the NMF decomposition of a metabolic profile matrix extracted from 39 publicly available metagenomic samples, and identify canonical sample types, including one associated with coral ecosystems, one associated with highly saline ecosystems and others. We also identify specific associations between pathways and canonical environments, and explore how alternative choices of decompositions facilitate analysis of read matrices at a finer scale.
机译:元基因组学研究直接从环境样品中测序DNA,以探索复杂的微生物和病毒群落的结构和功能。单个的,短序列的DNA(“读数”)被分类为(假定的)分类或代谢组,然后分析样本中的模式。对此类读取矩阵的分析是使用宏基因组学数据推断生态系统结构和功能的核心。非负矩阵分解(NMF)是一种数值技术,用于将高维数据点近似为正分量的正线性组合。因此,它非常适合于将观察到的样本解释为不同成分的组合。我们开发,测试和应用基于NMF的框架来分析宏基因组读取矩阵。特别是,我们介绍了一种在存在重叠的情况下选择NMF度的方法,并将光谱重排序技术应用于基于NMF的相似性矩阵以辅助可视化。我们表明,我们的方法可以使用合成数据集稳健地确定适当的程度,并解开重叠的贡献。然后,我们检查并讨论了从39个公共可用的宏基因组学样本中提取的代谢谱矩阵的NMF分解,并确定了规范的样本类型,包括一种与珊瑚生态系统相关,一种与高盐度生态系统相关的类型以及其他。我们还确定了路径与规范环境之间的特定关联,并探讨了分解的替代选择如何促进更精细规模的读取矩阵的分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号