...
首页> 外文期刊>Microbiome >MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics
【24h】

MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics

机译:Metaeuk敏感,高通量基因发现,以及大规模真核偏心组学的注释

获取原文
           

摘要

BACKGROUND:Metagenomics is revolutionizing the study of microorganisms and their involvement in biological, biomedical, and geochemical processes, allowing us to investigate by direct sequencing a tremendous diversity of organisms without the need for prior cultivation. Unicellular eukaryotes play essential roles in most microbial communities as chief predators, decomposers, phototrophs, bacterial hosts, symbionts, and parasites to plants and animals. Investigating their roles is therefore of great interest to ecology, biotechnology, human health, and evolution. However, the generally lower sequencing coverage, their more complex gene and genome architectures, and a lack of eukaryote-specific experimental and computational procedures have kept them on the sidelines of metagenomics.RESULTS:MetaEuk is a toolkit for high-throughput, reference-based discovery, and annotation of protein-coding genes in eukaryotic metagenomic contigs. It performs fast searches with 6-frame-translated fragments covering all possible exons and optimally combines matches into multi-exon proteins. We used a benchmark of seven diverse, annotated genomes to show that MetaEuk is highly sensitive even under conditions of low sequence similarity to the reference database. To demonstrate MetaEuk's power to discover novel eukaryotic proteins in large-scale metagenomic data, we assembled contigs from 912 samples of the Tara Oceans project. MetaEuk predicted 12,000,000 protein-coding genes in 8 days on ten 16-core servers. Most of the discovered proteins are highly diverged from known proteins and originate from very sparsely sampled eukaryotic supergroups.CONCLUSION:The open-source (GPLv3) MetaEuk software (https://github.com/soedinglab/metaeuk) enables large-scale eukaryotic metagenomics through reference-based, sensitive taxonomic and functional annotation. Video abstract.
机译:背景:偏心组合是彻底改变微生物的研究及其参与生物,生物医学和地球化学过程,允许我们通过直接测序巨大的生物体而无需先前培养来调查。单细胞真核生物在大多数微生物社区中发挥基本作用,作为主要捕食者,分解,光学术,细菌宿主,共生和植物和动物的寄生虫。因此,调查他们的角色对生态,生物技术,人体健康和进化感兴趣。然而,通常较低的测序覆盖率,它们更复杂的基因和基因组架构以及缺乏真核特异性实验和计算程序使它们在Metagenomics的侧链中保持了它们。结果:Metaeuk是高吞吐量,基于参考的工具包真核代理体Centig中蛋白质编码基因的发现和注释。它使用6帧翻译的片段进行快速搜索,覆盖所有可能的外显子,并最佳地结合到多外显子蛋白中。我们使用了七种不同的注释基因组的基准,以表明,即使在与参考数据库的低序列相似条件下,Metaeuk也非常敏感。为了展示Metaeuk在大规模的Metagenomic数据中发现新型真核蛋白的能力,我们组装了来自塔拉海洋项目的912个样本的Contigs。 Metaeuk在十个16核心服务器上预测了8天的蛋白质编码基因。大多数发现的蛋白质从已知的蛋白质高度分散,源自非常稀疏的采样的真核超群。结论:开源(GPLv3)Metaeuk软件(https://github.com/soedinglab/metaeuk)使大规模的真核生物组织能够实现通过基于参考,敏感的分类和功能注释。视频摘要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号