首页> 外文期刊>Future generation computer systems >Parallelization of local BLAST service on workstation clusters
【24h】

Parallelization of local BLAST service on workstation clusters

机译:工作站群集上本地BLAST服务的并行化

获取原文
获取原文并翻译 | 示例
       

摘要

This paper describes approaches to improve the performance of one of the most common and increasingly important aspects of the Human Genome Project (HGP) --large--volume, batch comparison of DNA sequence data. This basic comparison operation, usually carried out by the well-known BLAST program on one subject sequence against the internationally available databases of nearly five million target sequences, is already used hundreds of thousands of times each day by researchers around the world. At present, it is still used primarily in single query, or small batch query mode. As the entire sequence of the human genome nears completion, the area of functional genomics, and the use of micro--arrays of sets of genes, is coming to the fore. These developments will demand ever more efficient means of BLASTing sets of data that will make single processor implementation on powerful workstations infeasible. We describe the three primary paralleI components to BLAST. The first is at the sequence-to-sequence comparison level. The second parallelizes a single query across a partitioned and distributed database. Finally, the set of queries themselves are partitioned across a set of servers with replicated or partitioned databases. The three methods may be employed alone or in concert. Our current implementation is described which parallelizes batch requests, and our plans for implementation of the other levels is also described. The results will ultimately be applied to hardware assistance for this soon-to-be primitive computer operation.
机译:本文描述了改善人类基因组计划(HGP)最常见且日益重要的方面之一的性能的方法-大批量,批量比较DNA序列数据。这种基本的比较操作通常由著名的BLAST程序在一个主题序列上与国际上将近500万个目标序列的数据库进行比较,每天被世界各地的研究人员使用数十万次。目前,它仍主要用于单查询或小批量查询模式。随着人类基因组的整个序列接近完成,功能基因组学的领域以及基因组微阵列的使用正日趋重要。这些发展将需要更加有效的方法来爆炸数据集,这将使在强大的工作站上实现单处理器变得不可行。我们描述了BLAST的三个主要并行组件。第一个是序列间比较级别。第二个方法跨分区和分布式数据库并行化单个查询。最后,查询集本身在具有复制或分区数据库的一组服务器之间进行分区。这三种方法可以单独使用或一起使用。描述了我们当前的实现,该实现使批处理请求并行化,并且还描述了其他级别的实现计划。该结果最终将被应用到此即将成为原始计算机操作的硬件帮助中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号