...
首页> 外文期刊>BMC Bioinformatics >Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research
【24h】

Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research

机译:在FPGA硬件中加速字符串集匹配以进行生物信息学研究

获取原文
           

摘要

Background This paper describes techniques for accelerating the performance of the string set matching problem with particular emphasis on applications in computational proteomics. The process of matching peptide sequences against a genome translated in six reading frames is part of a proteogenomic mapping pipeline that is used as a case-study. The Aho-Corasick algorithm is adapted for execution in field programmable gate array (FPGA) devices in a manner that optimizes space and performance. In this approach, the traditional Aho-Corasick finite state machine (FSM) is split into smaller FSMs, operating in parallel, each of which matches up to 20 peptides in the input translated genome. Each of the smaller FSMs is further divided into five simpler FSMs such that each simple FSM operates on a single bit position in the input (five bits are sufficient for representing all amino acids and special symbols in protein sequences). Results This bit-split organization of the Aho-Corasick implementation enables efficient utilization of the limited random access memory (RAM) resources available in typical FPGAs. The use of on-chip RAM as opposed to FPGA logic resources for FSM implementation also enables rapid reconfiguration of the FPGA without the place and routing delays associated with complex digital designs. Conclusion Experimental results show storage efficiencies of over 80% for several data sets. Furthermore, the FPGA implementation executing at 100 MHz is nearly 20 times faster than an implementation of the traditional Aho-Corasick algorithm executing on a 2.67 GHz workstation.
机译:背景技术本文介绍了用于加速字符串集匹配问题性能的技术,尤其着重于计算蛋白质组学中的应用。肽序列与在六个阅读框中翻译的基因组匹配的过程是蛋白基因组作图流水线的一部分,用作案例研究。 Aho-Corasick算法适用于以优化空间和性能的方式在现场可编程门阵列(FPGA)设备中执行。在这种方法中,传统的Aho-Corasick有限状态机(FSM)被拆分为多个较小的FSM,它们并行运行,每个FSM与输入翻译基因组中的多达20个肽段匹配。每个较小的FSM进一步分为五个更简单的FSM,以使每个简单FSM在输入中的单个位上运行(五个位足以代表蛋白质序列中的所有氨基酸和特殊符号)。结果通过Aho-Corasick实现的按位拆分组织,可以有效利用典型FPGA中可用的有限随机存取存储器(RAM)资源。使用片上RAM而不是FPGA逻辑资源来实现FSM,还可以快速重新配置FPGA,而不会出现与复杂数字设计相关的布局和布线延迟。结论实验结果表明,多个数据集的存储效率超过80%。此外,以100 MHz执行的FPGA实现比在2.67 GHz工作站上执行的传统Aho-Corasick算法的实现快近20倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号