首页> 美国卫生研究院文献>Genome Research >SHARCGS a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing
【2h】

SHARCGS a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing

机译:SHARCGS一种用于从头进行基因组测序的快速高精度的短读组装算法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The latest revolution in the DNA sequencing field has been brought about by the development of automated sequencers that are capable of generating giga base pair data sets quickly and at low cost. Applications of such technologies seem to be limited to resequencing and transcript discovery, due to the shortness of the generated reads. In order to extend the fields of application to de novo sequencing, we developed the SHARCGS algorithm to assemble short-read (25–40-mer) data with high accuracy and speed. The efficiency of SHARCGS was tested on BAC inserts from three eukaryotic species, on two yeast chromosomes, and on two bacterial genomes (Haemophilus influenzae, Escherichia coli). We show that 30-mer-based BAC assemblies have N50 sizes >20 kbp for Drosophila and Arabidopsis and >4 kbp for human in simulations taking missing reads and wrong base calls into account. We assembled 949,974 contigs with length >50 bp, and only one single contig could not be aligned error-free against the reference sequences. We generated 36-mer reads for the genome of Helicobacter acinonychis on the Illumina 1G sequencing instrument and assembled 937 contigs covering 98% of the genome with an N50 size of 3.7 kbp. With the exception of five contigs that differ in 1–4 positions relative to the reference sequence, all contigs matched the genome error-free. Thus, SHARCGS is a suitable tool for fully exploiting novel sequencing technologies by assembling sequence contigs de novo with high confidence and by outperforming existing assembly algorithms in terms of speed and accuracy.
机译:DNA测序领域的最新革命是通过自动测序仪的开发带来的,这种测序仪能够快速,低成本地生成千兆碱基对数据集。由于产生的读段的短性,此类技术的应用似乎仅限于重新测序和转录本发现。为了扩展从头测序的应用范围,我们开发了SHARCGS算法,以高精度和高速度组装短读(25–40-mer)数据。在来自三个真核物种的BAC插入片段,两个酵母染色体和两个细菌基因组(流感嗜血杆菌,大肠杆菌)上测试了SHARCGS的效率。我们显示,在模拟中,考虑到缺失的读数和错误的碱基检出,基于30聚体的BAC组件在果蝇和拟南芥中的N50大小> 20 kbp,对于人类> 4 kbp。我们组装了长度大于50 bp的949,974个重叠群,只有一个单一的重叠群无法与参考序列进行无错比对。我们在Illumina 1G测序仪上产生了36聚体的腺泡幽门螺杆菌基因组读数,并组装了937个重叠群,覆盖了98%的基因组,N50大小为3.7 kbp。除了相对于参考序列在1至4位不同的五个重叠群之外,所有重叠群均与基因组无错匹配。因此,SHARCGS是一种适合的工具,可以通过高信度地重新组装序列重叠序列以及在速度和准确性方面优于现有的组装算法,来充分利用新颖的测序技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号