Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

Steven Lewis; Attila Csordas; Sarah Killcoyne; Henning Hermjakob; Michael R Hoopmann; Robert L Moritz; Eric W Deutsch; John Boyle

首页> 外文期刊>BMC Bioinformatics >Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

【24h】

Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

机译：Hydra：可扩展的蛋白质组搜索引擎，利用Hadoop分布式计算框架

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. Results We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. Conclusion The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.

机译：背景技术对于基于shot弹枪质谱的蛋白质组学，最昂贵的计算步骤是将光谱与越来越大的序列数据库及其已知质量的翻译后修饰进行匹配。每个质谱仪都能以惊人的高速率生成数据，并且搜索范围不断扩大。因此，需要用于提高我们执行这些搜索能力的解决方案。结果我们提供了一个序列数据库搜索引擎，该引擎专门设计用于在Hadoop MapReduce分布式计算框架上高效运行。搜索引擎实现K分数算法，为与原始实现相同的输入文件生成可比较的输出。显示了系统的可伸缩性，并讨论了开发这种分布式处理所需的体系结构。结论该软件在处理大型多肽数据库，大量修改和大量光谱方面的能力具有可扩展性。性能随集群中处理器的数量而扩展，从而可通过可用资源扩展吞吐量。

著录项

来源
《BMC Bioinformatics》 |2012年第1期|共页
作者
Steven Lewis; Attila Csordas; Sarah Killcoyne; Henning Hermjakob; Michael R Hoopmann; Robert L Moritz; Eric W Deutsch; John Boyle;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词

相似文献

外文文献
中文文献
专利

1. A Hierarchical Hadoop Framework to Handle Big Data in Geo-Distributed Computing Environments [J] . Orazio Tomarchio, Giuseppe Di Modica, Marco Cavallo, International journal of information technologies and systems approach . 2018,第1期

机译：在地理分布式计算环境中处理大数据的分层Hadoop框架
2. A security framework in G-Hadoop for big data computing across distributed Cloud data centres [J] . Jiaqi Zhao, Lizhe Wang, Jie Tao, Journal of computer and system sciences . 2014,第5期

机译：G-Hadoop中用于跨分布式云数据中心进行大数据计算的安全框架
3. Investigation on Hadoop-based Distributed Search Engine [J] . Ning Chen, Chai Xiangyang Journal of Software Engineering . 2014,第3期

机译：基于Hadoop的分布式搜索引擎研究
4. Distributed content based image search engine using hadoop framework [C] . Dhananjay Uttarwar, Aakash Agarwal, Riyaz Kadiwar, 2017 International Conference on Communication and Signal Processing . 2017

机译：使用hadoop框架的基于分布式内容的图像搜索引擎
5. Design and implementation of distributed mobile computing platform using hadoop. [D] . Pandhe, Shraddha. 2013

机译：使用hadoop的分布式移动计算平台的设计与实现。
6. Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework [O] . Steven Lewis, Attila Csordas, Sarah Killcoyne, 2012

机译：Hydra：可扩展的蛋白质组搜索引擎利用Hadoop分布式计算框架
7. Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework [O] . Steven Lewis, Attila Csordas, Sarah Killcoyne, 2012

机译：Hydra：可扩展的蛋白质组搜索引擎，利用Hadoop分布式计算框架
8. Lilith: A Java framework for the development of scalable tools for high performance distributed computing platforms [R] . Evensky, D. A. , Gentile, A. C. , Armstrong, R. C. 1998

机译：Lilith：用于开发高性能分布式计算平台的可扩展工具的Java框架

Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

摘要

著录项

相似文献

相关主题

期刊订阅