【24h】

Shepard: A fast exact match short read aligner

机译:Shepard:快速匹配的短读对准器

获取原文

摘要

The mapping of many short sequences of DNA, called reads, to a long reference genome is an common task in molecular biology. The task amounts to a simple string search, allowing for a few mismatches due to mutations and inexact read quality. While existing solutions attempt to align a high percentage of the reads using small memory footprints, Shepard is concerned with only exact matches and speed. Using the human genome, Shepard is on the order of hundreds of thousands of times faster than current software implementations such as SOAP2 or Bowtie, and about 60 times faster than GPU implementations such as SOAP3. Shepard contains two components: a software program to preprocess a reference genome into a hash table, and a hardware pipeline for performing fast lookups. The hash table has one entry for each unique 100 base pair sequence that occurs in the reference genome, and contains the index of last occurrence and the number of occurrences. To reduce the hash table size, a minimal perfect hash table is used. The hardware pipeline was designed to perform hash table lookups very quickly, on the order of 600 million lookups per second, and was implemented on a Convey HC-1 high performance reconfigurable computing system. Shepard streams all of the short reads through a custom hardware pipeline and writes the alignment data (index of last occurrence and number of occurrences) to a binary results array.
机译:许多短序列的DNA的映射,称为读数,到长参考基因组是分子生物学中的共同任务。任务金额为简单的字符串搜索,允许由于突变和不精确的读取质量而缺少一些不匹配。虽然现有的解决方案试图使用小的内存占用脚印对准高比例的读数,但是谢泼德涉及完全匹配和速度。使用人类基因组,谢泼德比当前软件实现(如SOAP2或Bowtie)的速度快)数十万次,比SOAP3等GPU实现快约60倍。 Shepard包含两个组件:将参考基因组预处理到哈希表中的软件程序以及用于执行快速查找的硬件管道。哈希表有一个条目,每个唯一100个基对序列发生在参考基因组中,并包含上次发生的索引和出现的次数。为了减少散列表大小,使用最小的完美哈希表。硬件管道旨在非常快速地执行哈希表查找,大约每秒600百万查找,并在传输HC-1高性能可重新配置计算系统上实现。 Shepard通过自定义硬件流水线流流放所有短读,并将对齐数据(上次发生和出现次数)写入二进制结果数组。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号