...
首页> 外文期刊>Microbiome >Clustering co-abundant genes identifies components of the gut microbiome that are reproducibly associated with colorectal cancer and inflammatory bowel disease
【24h】

Clustering co-abundant genes identifies components of the gut microbiome that are reproducibly associated with colorectal cancer and inflammatory bowel disease

机译:聚类共富力基因识别肠道微生物组的组分,其可重复与结直肠癌和炎症性肠病相关联

获取原文

摘要

Whole-genome "shotgun" (WGS) metagenomic sequencing is an increasingly widely used tool for analyzing the metagenomic content of microbiome samples. While WGS data contains gene-level information, it can be challenging to analyze the millions of microbial genes which are typically found in microbiome experiments. To mitigate the ultrahigh dimensionality challenge of gene-level metagenomics, it has been proposed to cluster genes by co-abundance to form Co-Abundant Gene groups (CAGs). However, exhaustive co-abundance clustering of millions of microbial genes across thousands of biological samples has previously been intractable purely due to the computational challenge of performing trillions of pairwise comparisons. Here we present a novel computational approach to the analysis of WGS datasets in which microbial gene groups are the fundamental unit of analysis. We use the Approximate Nearest Neighbor heuristic for near-exhaustive average linkage clustering to group millions of genes by co-abundance. This results in thousands of high-quality CAGs representing complete and partial microbial genomes. We applied this method to publicly available WGS microbiome surveys and found that the resulting microbial CAGs associated with inflammatory bowel disease (IBD) and colorectal cancer (CRC) were highly reproducible and could be validated independently using multiple independent cohorts. This powerful approach to gene-level metagenomics provides a powerful path forward for identifying the biological links between the microbiome and human health. By proposing a new computational approach for handling high dimensional metagenomics data, we identified specific microbial gene groups that are associated with disease that can be used to identify strains of interest for further preclinical and mechanistic experimentation.
机译:全基因组“霰弹枪”(WGS)代理序列测序是一种越来越广泛使用的工具,用于分析微生物组样品的偏见含量。虽然WGS数据包含基因级信息,但分析了通常在微生物组实验中发现的数百万微生物基因可能具有挑战性。为了减轻基因级偏心组合的超高维度挑战,已经通过共同丰富地提出了群集基因以形成共同的基因组(CAG)。然而,数百万微生物基因的详尽共同聚类在成千上万的生物样品之前已经纯粹是由于进行了数万亿比较的计算挑战而棘手。在这里,我们提出了一种新的计算方法来分析WGS数据集,其中微生物基因是分析的基本单位。我们使用近乎穷的邻近启发式近乎穷的平均联系聚类,通过共同丰富地组成对数百万基因。这导致成千上万的高质量CAG代表完整和部分微生物基因组。我们将该方法应用于公开可用的WGS微生物组调查,发现所得与炎症性肠病(IBD)和结肠直肠癌(CRC)相关的Microbial CAG是高度可重复的,并且可以使用多个独立的群体独立验证。这种强大的基因级偏心组学方法提供了一种强大的道路,用于鉴定微生物组和人类健康之间的生物联系。通过提出一种用于处理高尺寸偏心组织数据的新计算方法,我们鉴定了与可用于鉴定进一步临床前和机械实验的疾病的疾病相关的特异性微生物基因组。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号