【24h】

Scaffolding algorithm using second and third-generation reads

机译:使用第二代和第三代读取的脚手架算法

获取原文

摘要

The second generation sequencing methods produce high-quality short reads, which are assembled into contigs by DNA assemblers. Due to the fact that length of a single read is limited to 500bp it is really hard to assembly full genomes or full chromosomes. Generating longer contigs with low cost of sequencing is a main effort of computer scientists in this area. We propose to link contings created from second-generation reads using reads from third-generation sequencers. Such reads have length 10-20kbp. An existing implementation of this approach appears to be time and memory demanding for larger genomes. We developed an algorithm based on Bloom filter and extremely memory-efficient associative array. Our implementation remarkably exceeds the previous one in terms of time and memory consumption. Presented algorithm, provided as a shared library, is a part of the dnaasm de-novo assembler. The library has been created using C++ programming language, Boost and Google Sparse Hash libraries. Both web browser-based graphical user interface and command line interface are provided. Our application has been tested on real data of bacteria, yeast and plant genomes.
机译:第二代排序方法产生高质量的短读数,通过DNA组装器组装成COLDIG。由于单个读取的长度限制为500bp,它真的很难组装完整基因组或全染色体。使用低排序成本产生较长的Contig,是计算机科学家在该领域的主要努力。我们建议使用来自第三代搜索器的读取从第二代读取创建的路由。这种读数具有10-20kbp的长度。现有方法的实现似乎是对较大基因组的时间和记忆要求。我们开发了一种基于绽放过滤器和极其内存高效的关联阵列的算法。在时间和内存消耗方面,我们的实施非常超过前一个。提供作为共享库的呈现算法是DNO-Novo汇编程序的一部分。已经使用C ++编程语言,Boost和Google稀疏哈希库创建了库。提供基于Web浏览器的图形用户界面和命令行界面。我们的应用已经在细菌,酵母和植物基因组的真实数据上进行了测试。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号