【24h】

Efficient techniques for k-mer counting

机译:高效的k-mer计数技术

获取原文

摘要

A large number of bioinformatics applications require counting of k-length substrings in genetically important long strings. K-mer counting generates the frequencies of each k-length substring in genome sequences. Genome assembly, repeat detection, multiple sequence alignment, error detection, and many other related applications use k-mer counting as a building block. Many approaches are already available to address the problem. Some of them are time efficient, and some of them are memory efficient. Most of the current solutions use multi-threading to utilize available cores of a machine. A few efficient disk-based algorithms have been devised to reduce required memory. We analyze all available algorithms, and time and memory requirements of those implementations. We improve time consumption by devising a novel algorithm to this problem. Our results show that this new algorithm outperforms previous best-known algorithms.
机译:大量的生物信息学应用需要对具有重要遗传意义的长字符串中的k长度子字符串进行计数。 K聚体计数产生基因组序列中每个k长度子串的频率。基因组组装,重复检测,多序列比对,错误检测以及许多其他相关应用程序都使用k-mer计数作为构建模块。已经有许多方法可以解决该问题。其中一些是省时的,而某些则是内存的。当前大多数解决方案都使用多线程来利用计算机的可用内核。已经设计了一些基于磁盘的有效算法来减少所需的内存。我们分析了所有可用算法以及这些实现的时间和内存要求。通过针对此问题设计新颖的算法,我们提高了时间消耗。我们的结果表明,该新算法优于以前的最著名算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号