首页> 外文期刊>Radiation oncology >Single-center versus multi-center data sets for molecular prognostic modeling: a simulation study
【24h】

Single-center versus multi-center data sets for molecular prognostic modeling: a simulation study

机译:单中心与分子预测建模的多中心数据集:模拟研究

获取原文
           

摘要

Prognostic models based on high-dimensional omics data generated from clinical patient samples, such as tumor tissues or biopsies, are increasingly used for prognosis of radio-therapeutic success. The model development process requires two independent discovery and validation data sets. Each of them may contain samples collected in a single center or a collection of samples from multiple centers. Multi-center data tend to be more heterogeneous than single-center data but are less affected by potential site-specific biases. Optimal use of limited data resources for discovery and validation with respect to the expected success of a study requires dispassionate, objective decision-making. In this work, we addressed the impact of the choice of single-center and multi-center data as discovery and validation data sets, and assessed how this impact depends on the three data characteristics signal strength, number of informative features and sample size. We set up a simulation study to quantify the predictive performance of a model trained and validated on different combinations of in silico single-center and multi-center data. The standard bioinformatical analysis workflow of batch correction, feature selection and parameter estimation was emulated. For the determination of model quality, four measures were used: false discovery rate, prediction error, chance of successful validation (significant correlation of predicted and true validation data outcome) and model calibration. In agreement with literature about generalizability of signatures, prognostic models fitted to multi-center data consistently outperformed their single-center counterparts when the prediction error was the quality criterion of interest. However, for low signal strengths and small sample sizes, single-center discovery sets showed superior performance with respect to false discovery rate and chance of successful validation. With regard to decision making, this simulation study underlines the importance of study aims being defined precisely a priori. Minimization of the prediction error requires multi-center discovery data, whereas single-center data are preferable with respect to false discovery rate and chance of successful validation when the expected signal or sample size is low. In contrast, the choice of validation data solely affects the quality of the estimator of the prediction error, which was more precise on multi-center validation data.
机译:基于临床患者样品(如肿瘤组织或活组织检查)产生的基于高维的OMIC数据的预后模型越来越多地用于无线电治疗成功的预后。模型开发过程需要两个独立的发现和验证数据集。它们中的每一个可以含有在单个中心收集的样品或来自多个中心的样本集合。多中心数据往往比单中心数据更加异质,但受到潜在的基地特异性偏差的影响较小。关于研究的预期成功的发现和验证有限的数据资源的最佳利用需要电梯,客观的决策。在这项工作中,我们解决了选择单中心和多中心数据作为发现和验证数据集的影响,并评估了这种影响如何取决于三个数据特征信号强度,信息特征数量和样本大小。我们设置了模拟研究,以量化培训的模型的预测性能,并在硅单中心和多中心数据的不同组合上验证。仿真了批量校正,特征选择和参数估计的标准生物信息分析工作流程。为了确定模型质量,使用了四种措施:假发现率,预测误差,成功验证的可能性(预测和真实验证数据结果的显着相关)和模型校准。在关于签名的完全性的文献一致中,当预测误差是感兴趣的质量标准时,适合多中心数据的预后模型始终如一地表现出他们的单中心对应物。然而,对于低信号强度和小的样本尺寸,单中心发现组就虚假发现率和成功验证的可能性显示出优越的性能。关于决策,该模拟研究强调了研究的重要性旨在精确定义。最小化预测误差需要多中心发现数据,而单中心数据是关于在预期信号或样本大小低时成功验证的误报率和成功验证的可能性。相比之下,验证数据的选择仅影响预测误差的估计器的质量,这在多中心验证数据上更精确。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号