首页> 外文会议>Annual International Conference on Research in Computational Molecular Biology >A Concurrent Subtractive Assembly Approach for Identification of Disease Associated Sub-metagenomes
【24h】

A Concurrent Subtractive Assembly Approach for Identification of Disease Associated Sub-metagenomes

机译:一种同时的减析组装方法,用于鉴定疾病相关亚菌蛋白

获取原文

摘要

Comparative analysis of metagenomes can be used to detect sub-metagenomes (species or gene sets) that are associated with specific phenotypes (e.g., host status). The typical workflow is to assemble and annotate metagenomic datasets individually or as a whole, followed by statistical tests to identify differentially abundant species/genes. We previously developed subtractive assembly (SA), a de novo assembly approach for comparative metagenomics that first detects differential reads that distinguish between two groups of metagenomes and then only assembles these reads. Application of SA to type 2 diabetes (T2D) microbiomes revealed new microbial genes associated with T2D. Here we further developed a Concurrent Subtractive Assembly (CoSA) approach, which uses a Wilcoxon rank-sum (WRS) test to detect k-mers that are differentially abundant between two groups of microbiomes (by contrast, SA only checks ratios of k-mer counts in one pooled sample versus the other). It then uses identified differential k-mers to extract reads that are likely sequenced from the sub-metagenome with consistent abundance differences between the groups of microbiomes. Further, CoSA attempts to reduce the redundancy of reads (from abundant common species) by excluding reads containing abundant k-mers. Using simulated microbiome datasets and T2D datasets, we show that CoSA achieves strikingly better performance in detecting consistent changes than SA does, and it enables the detection and assembly of genomes and genes with minor abundance difference. A SVM classifier built upon the microbial genes detected by CoSA from the T2D datasets can accurately discriminates patients from healthy controls, with an AUC of 0.94 (10-fold crossvalidation), and therefore these differential genes (207 genes) may serve as potential microbial marker genes for T2D.
机译:宏基因组的比较分析,可以用于检测与特定的表型(例如,主机状态)相关联的子宏基因组(物种或基因组)。的典型工作流程是组装和注释的宏基因组数据集单独地或作为一个整体,然后统计测试,以鉴定差异丰富的物种/基因。我们先前开发的减法组件(SA),从头组装方法用于比较宏基因组首先检测差动读出两组宏基因组之间的区分,然后只组装这些读取。 SA的2型糖尿病的应用(T2D)微生物组揭示了与T2D相关联的新的微生物的基因。在这里,我们进一步开发出一种并行减法组件(COSA)的办法,其使用Wilcoxon秩和(WRS)测试来检测k聚体是两组微生物组(之间差异丰富相比之下,SA仅k链节的检查比率在一个计数集中的样品相对于其他的)。它然后使用识别的差动k聚体,以提取读取有可能从子宏基因组与微生物组的基团之间是一致的丰度差异测序。此外,COSA尝试通过排除减少读取的冗余(从丰富常见的物种)读取包含丰富k聚体。使用模拟微生物的数据集和数据集T2D,我们表明,COSA实现了检测比SA做一致的变化显着更好的性能,并且它能够检测和基因组和基因有轻微差异丰组装。在所述微生物基因构建的SVM分类器从T2D数据集可以精确地辨别患者与健康对照检测由COSA,具有0.94的AUC(10倍交叉验证),因此,这些差异基因(207个基因)可以用作潜在的微生物标记基因T2D。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号