首页> 外文期刊>BMC Genomics >Large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkey
【24h】

Large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkey

机译:使用第二代高通量测序技术在未测序基因组中发现大规模单核苷酸多态性:应用于火鸡

获取原文
           

摘要

Background The development of second generation sequencing methods has enabled large scale DNA variation studies at moderate cost. For the high throughput discovery of single nucleotide polymorphisms (SNPs) in species lacking a sequenced reference genome, we set-up an analysis pipeline based on a short read de novo sequence assembler and a program designed to identify variation within short reads. To illustrate the potential of this technique, we present the results obtained with a randomly sheared, enzymatically generated, 2-3 kbp genome fraction of six pooled Meleagris gallopavo (turkey) individuals. Results A total of 100 million 36 bp reads were generated, representing approximately 5-6% (~62 Mbp) of the turkey genome, with an estimated sequence depth of 58. Reads consisting of bases called with less than 1% error probability were selected and assembled into contigs. Subsequently, high throughput discovery of nucleotide variation was performed using sequences with more than 90% reliability by using the assembled contigs that were 50 bp or longer as the reference sequence. We identified more than 7,500 SNPs with a high probability of representing true nucleotide variation in turkeys. Increasing the reference genome by adding publicly available turkey BAC-end sequences increased the number of SNPs to over 11,000. A comparison with the sequenced chicken genome indicated that the assembled turkey contigs were distributed uniformly across the turkey genome. Genotyping of a representative sample of 340 SNPs resulted in a SNP conversion rate of 95%. The correlation of the minor allele count (MAC) and observed minor allele frequency (MAF) for the validated SNPs was 0.69. Conclusion We provide an efficient and cost-effective approach for the identification of thousands of high quality SNPs in species currently lacking a sequenced genome and applied this to turkey. The methodology addresses a random fraction of the genome, resulting in an even distribution of SNPs across the targeted genome.
机译:背景技术第二代测序方法的发展使人们能够以中等成本进行大规模的DNA变异研究。为了在缺乏测序参考基因组的物种中高通量发现单核苷酸多态性(SNP),我们基于短读从头序列组装器和旨在识别短读中变异的程序建立了一个分析管线。为了说明该技术的潜力,我们介绍了六个混合的Meleagris gallopavo(火鸡)个体的随机剪切,酶促生成的2-3 kbp基因组片段所获得的结果。结果总共产生了1亿个36 bp的读段,约占土耳其基因组的5-6%(〜62 Mbp),估计序列深度为58。选择了被称为碱基的错误概率小于1%的读段。并组装成重叠群随后,通过使用50 bp或更长的组装重叠群作为参考序列,使用可靠性超过90%的序列进行了核苷酸变异的高通量发现。我们鉴定出超过7,500个SNP,它们很有可能代表火鸡中的真实核苷酸变异。通过添加公开可用的火鸡BAC末端序列来增加参考基因组,使SNP的数量增加到超过11,000。与测序的鸡基因组的比较表明,组装的火鸡重叠群在火鸡基因组上均匀分布。对340个SNP的代表性样品进行基因分型,得出SNP转化率为95%。验证的SNP的次要等位基因计数(MAC)与观察到的次要等位基因频率(MAF)的相关性为0.69。结论我们提供了一种有效且经济高效的方法来鉴定目前缺乏测序基因组的物种中的数千种高质量SNP,并将其应用于火鸡。该方法论解决了基因组的随机部分,从而使SNP在目标基因组中均匀分布。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号