...
首页> 外文期刊>BMC Bioinformatics >Probabilistic base calling of Solexa sequencing data
【24h】

Probabilistic base calling of Solexa sequencing data

机译:Solexa测序数据的概率基础调用

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology. Results We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads. Conclusion We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.
机译:背景Solexa / Illumina短读超高通量DNA测序技术通过DNA菌落的合成并行测序产生数百万个短标签(最多36个碱基)。这种高通量数据的处理和统计分析提出了新的挑战。目前,由于无法将标签与参考序列进行匹配,因此通常会丢弃一部分标签,从而降低了该技术的有效吞吐量。结果我们提出了一种新颖的碱基调用算法,该算法使用基于模型的聚类和概率论来识别模棱两可的碱基并使用IUPAC符号对其进行编码。我们还使用基于信息内容的分数来选择最佳子标签,以消除靠近阅读末端的不确定碱基。结论我们证明,与Solexa的数据处理流程相比,该方法可提高基因组覆盖率和可用标签数量,平均提高15%。提供了一个R软件包,可以快速,准确地对Solexa的荧光强度文件进行碱基检定,并提供有用的诊断图。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号