首页> 外文期刊>Bioinformatics >A self-updating road map of The Cancer Genome Atlas
【24h】

A self-updating road map of The Cancer Genome Atlas

机译:癌症基因组图谱的自我更新路线图

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Since 2011, The Cancer Genome Atlas’ (TCGA) files have been accessible through HTTP from a public site, creating entirely new possibilities for cancer informatics by enhancing data discovery and retrieval. Significantly, these enhancements enable the reporting of analysis results that can be fully traced to and reproduced using their source data. However, to realize this possibility, a continually updated road map of files in the TCGA is required. Creation of such a road map represents asignificant data modeling challenge, due to the size and fluidity of this resource: each of the 33 cancer types is instantiated in only partially overlapping sets of analytical platforms, while the number of data files available doubles approximately every 7 months. Results: We developed an engine to index and annotate the TCGA files, relying exclusively on third-generation web technologies (Web 3.0). Specifically, this engine uses JavaScript in conjunction with the World Wide Web Consortium’s (W3C) Resource Description Framework (RDF), and SPARQL, the query language for RDF, to capture metadata of files in the TCGA open-access HTTP directory. The resulting index may be queried using SPARQL, and enables file-level provenance annotations as well as discovery of arbitrary subsets of files, based on their metadata, using web standard languages. In turn, these abilities enhance the reproducibility and distribution of novel results delivered as elements of a web-based computational ecosystem. The development of the TCGA Roadmap engine was found to provide specific clues about how biomedical big data initiatives should be exposed as public resources for exploratory analysis, data mining and reproducible research. These specific design elements align withthe concept of knowledge reengineering and represent a sharp departure from top-down approaches in grid initiatives such as CaBIG. They also present a much more interoperable and reproducible alternative to the still pervasive use of data portals. Availability: A prepared dashboard, including links to source code and a SPARQL endpoint, is available at http://bit.ly/TCGARoadmap. A video tutorial is available at http://bit.ly/TCGARoadmapTutorial.
机译:动机:自2011年以来,癌症基因组图谱(TCGA)文件可通过HTTP从公共站点进行访问,通过增强数据发现和检索,为癌症信息学创造了全新的可能性。重要的是,这些增强功能可以报告分析结果,并可以使用其源数据对其进行完全跟踪和复制。但是,为了实现这种可能性,需要不断更新TCGA中文件的路线图。由于这种资源的规模和流动性,创建这种路线图代表着巨大的数据建模挑战:仅在部分重叠的分析平台中实例化33种癌症类型中的每一种,而可用的数据文件数量大约每7倍翻一番几个月。结果:我们开发了一种引擎,可以完全依靠第三代Web技术(Web 3.0)来对TCGA文件进行索引和注释。具体来说,此引擎将JavaScript与万维网联盟(W3C)的资源描述框架(RDF)和SPARQL(RDF的查询语言)结合使用,以捕获TCGA开放访问HTTP目录中文件的元数据。可以使用SPARQL查询生成的索引,并使用Web标准语言根据文件的元数据启用文件级出处注释以及发现文件的任意子集。反过来,这些功能增强了作为基于Web的计算生态系统的要素而交付的新颖结果的可重复性和分布。人们发现,TCGA路线图引擎的开发提供了有关生物医学大数据计划应如何作为公共资源进行探索性分析,数据挖掘和可重复研究的公共线索。这些特定的设计元素与知识再造的概念相吻合,并且与诸如CaBIG之类的网格计划中的自上而下的方法大相径庭。与仍然广泛使用数据门户网站相比,它们还提供了一种更具互操作性和可重现性的替代方法。可用性:http://bit.ly/TCGARoadmap上提供了准备好的仪表板,包括指向源代码和SPARQL端点的链接。可从http://bit.ly/TCGARoadmapTutorial获得视频教程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号