...
首页> 外文期刊>PLoS Computational Biology >Inferring Correlation Networks from Genomic Survey Data
【24h】

Inferring Correlation Networks from Genomic Survey Data

机译:从基因组调查数据推断相关网络

获取原文
   

获取外文期刊封面封底 >>

       

摘要

High-throughput sequencing based techniques, such as 16S rRNA gene profiling, have the potential to elucidate the complex inner workings of natural microbial communities - be they from the world's oceans or the human gut. A key step in exploring such data is the identification of dependencies between members of these communities, which is commonly achieved by correlation analysis. However, it has been known since the days of Karl Pearson that the analysis of the type of data generated by such techniques (referred to as compositional data) can produce unreliable results since the observed data take the form of relative fractions of genes or species, rather than their absolute abundances. Using simulated and real data from the Human Microbiome Project, we show that such compositional effects can be widespread and severe: in some real data sets many of the correlations among taxa can be artifactual, and true correlations may even appear with opposite sign. Additionally, we show that community diversity is the key factor that modulates the acuteness of such compositional effects, and develop a new approach, called SparCC (available at https://bitbucket.org/yonatanf/sparcc), which is capable of estimating correlation values from compositional data. To illustrate a potential application of SparCC, we infer a rich ecological network connecting hundreds of interacting species across 18 sites on the human body. Using the SparCC network as a reference, we estimated that the standard approach yields 3 spurious species-species interactions for each true interaction and misses 60% of the true interactions in the human microbiome data, and, as predicted, most of the erroneous links are found in the samples with the lowest diversity.
机译:基于高通量测序的技术,例如16S rRNA基因谱分析,有可能阐明天然微生物群落的复杂内部运作方式-无论是来自世界海洋还是来自人类肠道。探索此类数据的关键步骤是识别这些社区成员之间的依赖性,这通常是通过相关性分析来实现的。但是,自卡尔·皮尔森(Karl Pearson)时代以来,就知道通过这种技术生成的数据类型(称为组成数据)的分析可能会产生不可靠的结果,因为观察到的数据采取的是基因或物种的相对分数形式,而不是绝对的丰度。使用来自人类微生物组计划的模拟和真实数据,我们证明了这种组成效应可能是广泛而严重的:在某些真实数据集中,分类单元之间的许多相关性可能是人为的,甚至可能以相反的符号出现真实的相关性。此外,我们证明了社区多样性是调节这种组合效应的严重性的关键因素,并开发了一种新方法,称为SparCC(可从https://bitbucket.org/yonatanf/sparcc获取),它能够估计相关性。成分数据中的值。为了说明SparCC的潜在应用,我们推断出一个丰富的生态网络,它将跨越人体18个站点的数百种相互作用的物种连接在一起。使用SparCC网络作为参考,我们估计标准方法会为每个真正的相互作用产生3种虚假的物种-物种相互作用,并且会错过人类微生物组数据中60%的真正的相互作用,并且正如预测的那样,大多数错误链接是在多样性最低的样品中发现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号