...
首页> 外文期刊>Journal of Parallel and Distributed Computing >MPI framework for parallel searching in large biological databases
【24h】

MPI framework for parallel searching in large biological databases

机译:用于大型生物数据库中并行搜索的MPI框架

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In this paper, we address the problem of searching huge biological databases on the scale of at least several gigabytes by utilizing parallel processing. Biological databases storing DNA sequences, protein sequences, or mass spectra are growing exponentially. Searches through these databases consume exponentially growing computational resources as well. We demonstrate herein a general use, MPI based, C++ framework for generically splitting databases amongst several computational nodes. The combined RAM of the nodes working in tandem is often sufficient to keep the entire database in memory, and therefore to search it efficiently without paging to disk. The framework runs as a persistent service, processing all submitted queries. This allows for query reordering and better utilization of the memory. Thereby, we achieve superlinear speedups compared to single processor implementations. We demonstrate the utility and speedup of the framework using a real biological database and an actual searching algorithm for mass spectrometry. (C) 2006 Elsevier Inc. All rights reserved.
机译:在本文中,我们解决了通过利用并行处理来搜索至少几千兆字节规模的巨大生物数据库的问题。存储DNA序列,蛋白质序列或质谱图的生物数据库正在呈指数增长。通过这些数据库进行搜索也会消耗成倍增长的计算资源。我们在此演示了一种通用的,基于MPI的C ++框架,用于在几个计算节点之间通用地拆分数据库。串联工作的节点的组合RAM通常足以将整个数据库保留在内存中,因此可以高效地搜索数据库而无需分页到磁盘。该框架作为持久服务运行,处理所有提交的查询。这允许查询重新排序和更好地利用内存。因此,与单处理器实现相比,我们实现了超线性加速。我们使用真实的生物学数据库和质谱的实际搜索算法演示了该框架的实用性和加速性。 (C)2006 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号