...
首页> 外文期刊>GigaScience >A benchmark study of k-mer counting methods for high-throughput sequencing
【24h】

A benchmark study of k-mer counting methods for high-throughput sequencing

机译:高通量测序k-mer计数方法的基准研究

获取原文
   

获取外文期刊封面封底 >>

       

摘要

The rapid development of high-throughput sequencing technologies means that hundreds of gigabytes of sequencing data can be produced in a single study. Many bioinformatics tools require counts of substrings of length k in DNA/RNA sequencing reads obtained for applications such as genome and transcriptome assembly, error correction, multiple sequence alignment, and repeat detection. Recently, several techniques have been developed to count k -mers in large sequencing datasets, with a trade-off between the time and memory required to perform this function. We assessed several k -mer counting programs and evaluated their relative performance, primarily on the basis of runtime and memory usage. We also considered additional parameters such as disk usage, accuracy, parallelism, the impact of compressed input, performance in terms of counting large k values and the scalability of the application to larger datasets.We make specific recommendations for the setup of a current state-of-the-art program and suggestions for further development.
机译:高通量测序技术的飞速发展意味着在一项研究中可以生成数百GB的测序数据。许多生物信息学工具需要在DNA / RNA测序读取中获得长度为k的子串计数,以用于基因组和转录组组装,错误校正,多序列比对和重复检测等应用。最近,已经开发了几种技术来对大型测序数据集中的k聚体进行计数,并在执行该功能所需的时间和内存之间进行权衡。我们评估了几个k-mer计数程序并评估了它们的相对性能,主要是基于运行时和内存使用情况。我们还考虑了其​​他参数,例如磁盘使用率,准确性,并行性,压缩输入的影响,在计算大k值方面的性能以及应用程序对较大数据集的可伸缩性。对于当前状态的设置,我们提出了具体建议-最新的计划和进一步发展的建议。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号