首页> 外文期刊>BMC Bioinformatics >Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework
【24h】

Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

机译:Hydra:可扩展的蛋白质组搜索引擎,利用Hadoop分布式计算框架

获取原文
           

摘要

Background For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. Results We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. Conclusion The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.
机译:背景技术对于基于shot弹枪质谱的蛋白质组学,最昂贵的计算步骤是将光谱与越来越大的序列数据库及其已知质量的翻译后修饰进行匹配。每个质谱仪都能以惊人的高速率生成数据,并且搜索范围不断扩大。因此,需要用于提高我们执行这些搜索能力的解决方案。结果我们提供了一个序列数据库搜索引擎,该引擎专门设计用于在Hadoop MapReduce分布式计算框架上高效运行。搜索引擎实现K分数算法,为与原始实现相同的输入文件生成可比较的输出。显示了系统的可伸缩性,并讨论了开发这种分布式处理所需的体系结构。结论该软件在处理大型多肽数据库,大量修改和大量光谱方面的能力具有可扩展性。性能随集群中处理器的数量而扩展,从而可通过可用资源扩展吞吐量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号