Federation in genomics pipelines: techniques and challenges

机译：基因组学管道中的联合：技术和挑战

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Federation is a popular concept in building distributed cyberinfrastructures, whereby computational resources are provided by multiple organizations through a unified portal, decreasing the complexity of moving data back and forth among multiple organizations. Federation has been used in bioinformatics only to a limited extent, namely, federation of datastores, e.g. SBGrid Consortium for structural biology and Gene Expression Omnibus (GEO) for functional genomics. Here, we posit that it is important to federate both computational resources (CPU, GPU, FPGA, etc.) and datastores to support popular bioinformatics portals, with fast-increasing data volumes and increasing processing requirements. A prime example, and one that we discuss here, is in genomics and metagenomics. It is critical that the processing of the data be done without having to transport the data across large network distances. We exemplify our design and development through our experience with metagenomics-RAST (MG-RAST), the most popular metagenomics analysis pipeline. Currently, it is hosted completely at Argonne National Laboratory. However, through a recently started collaborative National Institutes of Health project, we are taking steps toward federating this infrastructure. Being a widely used resource, we have to move toward federation without disrupting 50 K annual users. In this article, we describe the computational tools that will be useful for federating a bioinformatics infrastructure and the open research challenges that we see in federating such infrastructures. It is hoped that our manuscript can serve to spur greater federation of bioinformatics infrastructures by showing the steps involved, and thus, allow them to scale to support larger user bases.

机译：联盟是在构建分布式网络基础架构中流行的概念，由此多个组织通过统一的门户提供计算资源，从而降低了在多个组织之间来回移动数据的复杂性。联盟仅在有限的程度上用于生物信息学，即数据存储的联盟，例如数据存储。 SBGrid联盟用于结构生物学，而Gene Expression Omnibus（GEO）用于功能基因组学。在这里，我们认为，重要的是要联合计算资源（CPU，GPU，FPGA等）和数据存储以支持流行的生物信息学门户，同时数据量迅速增加且处理要求不断提高。基因组学和宏基因组学是一个很好的例子，我们在这里讨论。至关重要的是，无需在较大的网络距离上传输数据即可完成数据处理。我们通过最流行的宏基因组学分析流水线宏基因组学RAST（MG-RAST）的经验来举例说明我们的设计和开发。目前，它完全由Argonne国家实验室托管。但是，通过最近启动的一项合作的美国国立卫生研究院项目，我们正在采取措施，将这一基础设施联合起来。作为一种广泛使用的资源，我们必须朝着联盟发展，而不会中断每年5万名用户。在本文中，我们描述了对联合生物信息学基础设施将有用的计算工具，以及在联合此类基础设施时看到的开放研究挑战。希望我们的手稿能够通过显示所涉及的步骤来促进生物信息学基础设施的更大联盟，从而使它们能够扩展规模以支持更大的用户群。

著录项

期刊名称 Briefings in Bioinformatics
作者
Somali Chaterji; Jinkyu Koo; Ninghui Li; Folker Meyer; Ananth Grama; Saurabh Bagchi;
展开▼
作者单位

展开▼
年(卷),期 -1(20),1
年度 -1
页码 235–244
总页数 10
原文格式 PDF
正文语种
中图分类生化遗传学;生化药理学;
关键词
computational genomics cyberinfrastructure federation identity management MG-RAST genomic privacy;

机译：计算基因组学;网络基础设施;联盟;身份管理;MG-RAST;基因组隐私;
入库时间 2022-08-21 10:51:58

相似文献

外文文献
中文文献
专利

1. Federation in genomics pipelines: techniques and challenges [J] . Somali Chaterji, Jinkyu Koo, Ninghui Li, Briefings in bioinformatics . 2019,第1期

机译：基因组学管道联合会：技术和挑战
2. Genomics pipelines and data integration: challenges and opportunities in the research setting [J] . Davis-Turak Jeremy, Courtney Sean M., Hazard E. Starr, Expert Review of Molecular Diagnostics . 2017,第1a6期

机译：基因组学管道和数据集成：研究环境中的挑战和机遇
3. Pipelines Aim to Stay Ahead of Opponents on Legal Challenges and Protests At two different gas industry conferences recently, pipeline proponents discussed successes and challenges in building new infrastructure in an era when nearly every project is contested, with court decisions both favorable and unfavorable for pipeline developers. [J] . Foster Natural Gas Report . 2017,第3168期

机译：最近，管道旨在保持对反对者的法律挑战和抗议的抗议活动，据讨论了当几乎每个项目都有争议的时代建立新的基础设施的成功和挑战，涉嫌对管道开发人员有利和不利的法院决策。
4. Massively Parallel Multi-Chip Imputation Pipeline and Genomic Data Warehouse Apply a Novel Genomic Tiling Technology [C] . Elen Sukharevsky International Molecular Medicine Tri-Conference. . 2019

机译：大规模平行的多芯片归装管道和基因组数据仓库应用新型基因组平铺技术
5. Applied genomics: Development of bioinformatics pipelines for analyzing clinical pediatric genomic data. [D] . Crowgey, Erin L. 2016

机译：应用基因组学：开发用于分析临床儿科基因组数据的生物信息学管道。
6. Genomics pipelines and data integration: challenges and opportunities in the research setting [O] . Jeremy Davis-Turak, Sean M. Courtney, E. Starr Hazard, -1

机译：基因组学管道和数据集成：研究环境中的挑战和机遇
7. Unravelling the genomic landscape of leukemia using NGS techniques: the challenge remains [O] . Jieun Kim 2017

机译：使用NGS技术解开白血病的基因组景观：挑战仍然存在
8. Shewanella Federation: Functional Genomic Investigations of Dissimilatory Metal-Reducing Shewanella. Final Report [R] . Jizhong, Z., Zhili, H. 2009

机译：shewanella联合会：异化金属减少希瓦氏菌的功能基因组研究。总结报告

Federation in genomics pipelines: techniques and challenges

摘要

著录项

相似文献

相关主题

期刊订阅