...
首页> 外文期刊>International Journal of Advances in Soft Computing and Its Applications >Genomic Repeat Detection Using the Knuth-Morris-Pratt Algorithm on R High-Performance-Computing Package
【24h】

Genomic Repeat Detection Using the Knuth-Morris-Pratt Algorithm on R High-Performance-Computing Package

机译:使用R高性能计算软件包的Knuth-Morris-Pratt算法进行基因组重复检测

获取原文
           

摘要

Genomic repeat, which is to find repeating base pairs inDeoxyribonucleic Acid (DNA) sequences, can be used to detectgenetic disease by analyzing the overload or over normal limits of therepetition. Since it takes very high computation cost, this researchbuilds a parallel-computing model and its implementation to solve it.It can be achieved by modifying and implementing the Knuth-Morris-Pratt algorithm (KMP) on the R High-Performance-Computing Package, namely ‘pbdMPI’. It contains the followingsteps: preprocessing and splitting DNA sequence, KMP on parallelcomputing with ‘pbdMPI’, combining all indices, and calculatinggenomic repeats. To validate the model and implementation, 114experiments involving human DNA sequences are conducted on thestandalone and parallel-computing scenarios. The results show thatthe proposed system can reduce the computation cost, which is morethan 100 times faster than the standalone computing. Somecomparisons of the computation cost in term of the numbers ofbatches and numbers of cores are presented along with the existingresearches. In summary, the proposed model provides the significantimprovement on the computational cost.
机译:在脱氧核糖核酸(DNA)序列中发现重复碱基对的基因组重复序列可用于通过分析超负荷或重复序列的正常极限来检测遗传疾病。由于计算成本很高,因此本研究建立了并行计算模型及其解决方案,可以通过在R高性能计算软件包上修改和实现Knuth-Morris-Pratt算法(KMP)来实现,即“ pbdMPI”。它包含以下步骤:预处理和分割DNA序列,与“ pbdMPI”并行计算的KMP,结合所有指标以及计算基因组重复。为了验证该模型和实现,在独立和并行计算方案上进行了涉及人类DNA序列的114项实验。结果表明,所提出的系统可以降低计算成本,比独立计算速度快100倍以上。连同现有的研究,提出了在批数和芯数方面的计算成本的一些比较。综上所述,所提出的模型对计算成本提供了显着的提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号