首页> 美国卫生研究院文献>Translational Andrology and Urology >An overview of publicly available patient-centered prostate cancer datasets
【2h】

An overview of publicly available patient-centered prostate cancer datasets

机译:公开可用的以患者为中心的前列腺癌数据集概述

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Prostate cancer (PCa) is the second most common cancer in men, and the second leading cause of death from cancer in men. Many studies on PCa have been carried out, each taking much time before the data is collected and ready to be analyzed. However, on the internet there is already a wide range of PCa datasets available, which could be used for data mining, predictive modelling or other purposes, reducing the need to setup new studies to collect data. In the current scientific climate, moving more and more to the analysis of “big data” and large, international, multi-site projects using a modern IT infrastructure, these datasets could be proven extremely valuable. This review presents an overview of publicly available patient-centered PCa datasets, divided into three categories (clinical, genomics and imaging) and an “overall” section to enable researchers to select a suitable dataset for analysis, without having to go through days of work to find the right data. To acquire a list of human PCa databases, scientific literature databases and academic social network sites were searched. We also used the information from other reviews. All databases in the combined list were then checked for public availability. Only databases that were either directly publicly available or available after signing a research data agreement or retrieving a free login were selected for inclusion in this review. Data should be available to commercial parties as well. This paper focuses on patient-centered data, so the genomics data section does not include gene-centered databases or pathway-centered databases. We identified 42 publicly available, patient-centered PCa datasets. Some of these consist of different smaller datasets. Some of them contain combinations of datasets from the three data domains: clinical data, imaging data and genomics data. Only one dataset contains information from all three domains. This review presents all datasets and their characteristics: number of subjects, clinical fields, imaging modalities, expression data, mutation data, biomarker measurements, etc. Despite all the attention that has been given to making this overview of publicly available databases as extensive as possible, it is very likely not complete, and will also be outdated soon. However, this review might help many PCa researchers to find suitable datasets to answer the research question with, without the need to start a new data collection project. In the coming era of big data analysis, overviews like this are becoming more and more useful.
机译:前列腺癌(PCa)是男性第二大最常见的癌症,也是男性死于癌症的第二大主要原因。已经进行了许多有关PCa的研究,每次研究都花了很多时间才能收集数据并准备进行分析。但是,在互联网上,已经有大量PCa数据集可用,这些数据集可用于数据挖掘,预测建模或其他目的,从而减少了建立新研究以收集数据的需求。在当前的科学氛围中,越来越多的人开始使用现代IT基础结构来分析“大数据”和大型,国际性的多站点项目,这些数据集被证明是极有价值的。这篇综述概述了以患者为中心的公开可用的PCa数据集,分为三类(临床,基因组学和影像学)和一个“总体”部分,使研究人员能够选择合适的数据集进行分析,而无需花费很多工作找到正确的数据。为了获得人类PCa数据库的列表,搜索了科学文献数据库和学术社交网站。我们还使用了来自其他评论的信息。然后检查合并列表中的所有数据库的公共可用性。仅选择直接公开可用或在签署研究数据协议或检索免费登录后可用的数据库,以包括在此评论中。数据也应可供商业团体使用。本文重点关注以患者为中心的数据,因此基因组学数据部分不包括以基因为中心的数据库或以途径为中心的数据库。我们确定了42个公开可用,以患者为中心的PCa数据集。其中一些由不同的较小数据集组成。其中一些包含来自三个数据域的数据集的组合:临床数据,影像数据和基因组数据。只有一个数据集包含来自所有三个域的信息。这篇综述介绍了所有数据集及其特征:受试者数量,临床领域,成像方式,表达数据,突变数据,生物标志物测量等。尽管已经给予了尽可能多的关注,以使此公开数据库的概述尽可能广泛,它很可能不完整,并且很快也会过时。但是,此审查可能会帮助许多PCa研究人员找到合适的数据集来回答研究问题,而无需启动新的数据收集项目。在即将到来的大数据分析时代,诸如此类的概述变得越来越有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号