首页> 美国卫生研究院文献>Genome Research >Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics
【2h】

Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics

机译:Minerva:用于宏基因组学的解卷积链接阅读方法的无对齐和无参考方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Emerging Linked-Read technologies (aka read cloud or barcoded short-reads) have revived interest in short-read technology as a viable approach to understand large-scale structures in genomes and metagenomes. Linked-Read technologies, such as the 10x Chromium system, use a microfluidic system and a specialized set of 3′ barcodes (aka UIDs) to tag short DNA reads sourced from the same long fragment of DNA; subsequently, the tagged reads are sequenced on standard short-read platforms. This approach results in interesting compromises. Each long fragment of DNA is only sparsely covered by reads, no information about the ordering of reads from the same fragment is preserved, and 3′ barcodes match reads from roughly 2–20 long fragments of DNA. However, compared to long-read technologies, the cost per base to sequence is far lower, far less input DNA is required, and the per base error rate is that of Illumina short-reads. In this paper, we formally describe a particular algorithmic issue common to Linked-Read technology: the deconvolution of reads with a single 3′ barcode into clusters that represent single long fragments of DNA. We introduce Minerva, a graph-based algorithm that approximately solves the barcode deconvolution problem for metagenomic data (where reference genomes may be incomplete or unavailable). Additionally, we develop two demonstrations where the deconvolution of barcoded reads improves downstream results, improving the specificity of taxonomic assignments and of k-mer-based clustering. To the best of our knowledge, we are the first to address the problem of barcode deconvolution in metagenomics.
机译:新兴的链接阅读技术(又名阅读云或条形码短读)引起了人们对短读技术的兴趣,这是一种理解基因组和元基因组中大规模结构的可行方法。诸如10x Chromium系统之类的Linked-Read技术使用微流体系统和一组专门的3'条形码(也称为UID)来标记源自相同长DNA片段的短DNA片段。随后,标记的读段在标准的短读平台上进行测序。这种方法会导致有趣的折衷。每个长的DNA片段很少被读段覆盖,没有保留有关同一片段的读顺序的信息,并且3'条形码与大约2-20个长的DNA片段的读段匹配。但是,与长读技术相比,每个碱基对序列的成本要低得多,所需的输入DNA少得多,并且每个碱基的错误率是Illumina短读的错误率。在本文中,我们正式描述了Linked-Read技术常见的一个特殊算法问题:将具有单个3'条形码的读数反卷积为代表单个长DNA片段的簇。我们介绍Minerva,这是一种基于图的算法,可以近似解决宏基因组数据(其中参考基因组可能不完整或不可用)的条形码反卷积问题。此外,我们开发了两个演示,其中条码读取的反卷积改善了下游结果,提高了分类分配和基于k-mer的聚类的特异性。据我们所知,我们是第一个解决宏基因组学中条形码反卷积问题的公司。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号