Single-center versus multi-center data sets for molecular prognostic modeling: a simulation study

Daniel Samaga; Roman Hornung; Herbert Braselmann; Julia Hess; Horst Zitzelsberger; Claus Belka; Anne-Laure Boulesteix; Kristian Unger

首页> 外文期刊>Radiation oncology >Single-center versus multi-center data sets for molecular prognostic modeling: a simulation study

【24h】

Single-center versus multi-center data sets for molecular prognostic modeling: a simulation study

机译：单中心与分子预测建模的多中心数据集：模拟研究

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Prognostic models based on high-dimensional omics data generated from clinical patient samples, such as tumor tissues or biopsies, are increasingly used for prognosis of radio-therapeutic success. The model development process requires two independent discovery and validation data sets. Each of them may contain samples collected in a single center or a collection of samples from multiple centers. Multi-center data tend to be more heterogeneous than single-center data but are less affected by potential site-specific biases. Optimal use of limited data resources for discovery and validation with respect to the expected success of a study requires dispassionate, objective decision-making. In this work, we addressed the impact of the choice of single-center and multi-center data as discovery and validation data sets, and assessed how this impact depends on the three data characteristics signal strength, number of informative features and sample size. We set up a simulation study to quantify the predictive performance of a model trained and validated on different combinations of in silico single-center and multi-center data. The standard bioinformatical analysis workflow of batch correction, feature selection and parameter estimation was emulated. For the determination of model quality, four measures were used: false discovery rate, prediction error, chance of successful validation (significant correlation of predicted and true validation data outcome) and model calibration. In agreement with literature about generalizability of signatures, prognostic models fitted to multi-center data consistently outperformed their single-center counterparts when the prediction error was the quality criterion of interest. However, for low signal strengths and small sample sizes, single-center discovery sets showed superior performance with respect to false discovery rate and chance of successful validation. With regard to decision making, this simulation study underlines the importance of study aims being defined precisely a priori. Minimization of the prediction error requires multi-center discovery data, whereas single-center data are preferable with respect to false discovery rate and chance of successful validation when the expected signal or sample size is low. In contrast, the choice of validation data solely affects the quality of the estimator of the prediction error, which was more precise on multi-center validation data.

机译：基于临床患者样品（如肿瘤组织或活组织检查）产生的基于高维的OMIC数据的预后模型越来越多地用于无线电治疗成功的预后。模型开发过程需要两个独立的发现和验证数据集。它们中的每一个可以含有在单个中心收集的样品或来自多个中心的样本集合。多中心数据往往比单中心数据更加异质，但受到潜在的基地特异性偏差的影响较小。关于研究的预期成功的发现和验证有限的数据资源的最佳利用需要电梯，客观的决策。在这项工作中，我们解决了选择单中心和多中心数据作为发现和验证数据集的影响，并评估了这种影响如何取决于三个数据特征信号强度，信息特征数量和样本大小。我们设置了模拟研究，以量化培训的模型的预测性能，并在硅单中心和多中心数据的不同组合上验证。仿真了批量校正，特征选择和参数估计的标准生物信息分析工作流程。为了确定模型质量，使用了四种措施：假发现率，预测误差，成功验证的可能性（预测和真实验证数据结果的显着相关）和模型校准。在关于签名的完全性的文献一致中，当预测误差是感兴趣的质量标准时，适合多中心数据的预后模型始终如一地表现出他们的单中心对应物。然而，对于低信号强度和小的样本尺寸，单中心发现组就虚假发现率和成功验证的可能性显示出优越的性能。关于决策，该模拟研究强调了研究的重要性旨在精确定义。最小化预测误差需要多中心发现数据，而单中心数据是关于在预期信号或样本大小低时成功验证的误报率和成功验证的可能性。相比之下，验证数据的选择仅影响预测误差的估计器的质量，这在多中心验证数据上更精确。

著录项

来源
《Radiation oncology》 |2020年第1期|共14页
作者
Daniel Samaga; Roman Hornung; Herbert Braselmann; Julia Hess; Horst Zitzelsberger; Claus Belka; Anne-Laure Boulesteix; Kristian Unger;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Predictive modelOmics dataFeature selectionPredictive performanceStudy designValidation;

机译：预测模型DataFeature SelectionPredictive职业主义设计过修;

相似文献

外文文献
中文文献
专利

1. Data modeling versus simulation modeling in the big data era: case study of a greenhouse control system [J] . Kim Byeong Soo, Kang Bong Gu, Choi Seon Han, Simulation . 2017,第7期

机译：大数据时代的数据建模与仿真建模：温室控制系统的案例研究
2. Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study [J] . Andrea Marshall, Douglas G Altman, Patrick Royston, BMC Medical Research Methodology . 2010,第1期

机译：在模拟预测研究中处理缺失协变量数据的技术比较：模拟研究
3. Context-dependent models (CRRM, MuRRM, PRRM, RAM) versus a context-free model (MNL) in transportation studies: a comprehensive comparisons for Swiss and German SP and RP data sets [J] . Transportmetrica . 2019,第2期

机译：交通研究中的上下文相关模型（CRRM，MuRRM，PRRM，RAM）与上下文无关模型（MNL）：瑞士和德国SP和RP数据集的全面比较
4. Vector fields simplification --- a case study of visualizing climate modeling and simulation data sets [C] . Pak Chung Wong, Harlan Foote, Ruby Leung, Conference on Visualization '00 . 2000

机译：简化矢量场-以可视化气候建模和模拟数据集为例
5. Data management and data facilitation in multi-center large cohort health care studies. [D] . Wang, Wei. 2016

机译：多中心大型队列医疗研究中的数据管理和数据简化。
6. Single-center versus multi-center data sets for molecular prognostic modeling: a simulation study [O] . Daniel Samaga, Roman Hornung, Herbert Braselmann, 2020

机译：用于分子预测模型的单中心与多中心数据集：模拟研究
7. P981Lvot area measurement using gated ct data reclassifies aortic stenosis severity as graded by echocardiographyP982Paradoxical low-flow low-gradient aortic stenosis: an intermediate state between moderate and severe aortic stenosis?P983Can rheumatic significant mitral stenosis be a cause of paradoxical low gradient, low flow, in patients with severe aortic stenosis? an echocardiographic and outcome studyP984Clinical and hemodynamic comparison of isolated versus combined aortic and mitral stenosisP985Echocardiographic end-diastolic velocity in the proximal descending aorta should be interpreted with caution when the ascending aorta is dilated: insights from cardiovascular magnetic resonanceP987Prevalence of atrial mitral regurgitation in patients with severe mitral regurgitationP988Role of 2D/3D echocardiography in the risk stratification of endocardial lead-related tricuspid regurgitation: a single-centre study among?241 patientsP989When TEE is needed in patients with staphylococcus aureus bacteremia for the assessment of risk profile of infective endocarditis?P990Appropriateness criteria to echocardiograms for suspected infective endocarditis: experience of a tertiary referral centerP991Independent predictors of outcome in infective endocarditisP992The role of transesophageal cardiography in clinical course and prognosis of complicated infective endocarditis in critically ill patients: our 15 years experienceP993Left bundle branch block atypical pattern as a prognostic determinant in patients taken to TAVIP994Efficacy of long-term ivabradine therapy in severe systolic chronic heart failure patients with and without type 2 diabetes mellitusP995Relations between left ventricular reverse remodeling and serum markers of extracellular matrix fibrosis in dilated cardiomyopathyP996The healthy left ventricle accommodates an increasing vortex formation time for volume transfer in diastolic filling :Implications for heart failureP997Evolutionary changes of pulmonary artery pressure after left ventricular assist device implantP998Functional correlates and prognostic value of coronary flow velocity reserve by vasodilator stress echocardiography in hypertrophic cardiomyopathyP999Quantification of myocardial performance in patients with non-obstructive versus latent-obstructive hypertrophic cardiomyopathyP1000Lifelong arrhythmic risk stratification in arrhythmogenic right ventricular cardiomyopathy: distribution of events and impact of periodical reassessmentP1001Impact of fibrosis visualized by CMR in vectorcardiogram recordings of patients with suspected arrhythmogenic cardiomyopathyP1002Determinants of the beneficial effect of aldosterone antagonism on exercise capacity in heart failure with reduced ejection fractionP1003Myocardial strain values in patients with acute myocarditis and preserved ejection fraction. A magnetic resonance feature tracking studyP1004Detection of subclinical left ventricular dysfunction by speckle tracking echocardiography in patients with myocarditis without prominent wall motion abnormalitiesP1005Aborted sudden cardiac death patients aged <50 years show only mild alterations on cardiac magnetic resonance imagingP1006Relationships between subepicardial and subendocardial longitudinal strain with late gadolinium enhancement in uncomplicated hypertensive patients [O] . L. Moderato, C. Di Nora, A. Soufiani, 2016

机译：P981LVOT区域测量使用门控CT数据重新分类主动脉狭窄的严重程度，以超声心动图7982分类为分类，如二醇的低流量低梯度主动脉狭窄：中度和严重主动脉狭窄之间的中间状态？P983CAN风湿显着二尖瓣狭窄是矛盾的低梯度，低流量的原因在严重主动脉狭窄的患者中？超声心动图和结合分离的主动脉和二尖瓣术和二尖瓣狭窄的血液动力学比较的超声心动图和血液动力学比较在近期下降主动脉中应当谨慎地解释升高的主动脉：从心血管磁共振的洞察中的心血管磁共振PREValence在严重的患者中的洞察中解释二尖瓣regurgitationP988 rool 2D / 3D超声心动图在内膜内铅相关三尖瓣反流的风险分层：241例患者中的单一学习，在葡萄球菌的患者中需要TEE，用于评估感染性心内炎的风险概况？P990姑息度标准怀疑感染心内膜炎的超声心动图：第三节推荐中心的经验，感染endocardisp992在感染性Endocardisap999中的临床过程中的作用和复杂感染的预后的作用生病患者的心内膜炎：我们的15年经验训练束分支块的非典型模式作为患者的预后决定因素，以TaviP994患者在严重的收缩期慢性心力衰竭患者中患者，无型糖尿病患者左心室反向重塑和血清基质纤维化的血清标志物在扩张心肌脑肿瘤中，健康的左心室容纳舒张填充中体积转移的增加的涡旋形成时间：对左心室辅助装置Implantp998函数相关和冠状动脉速率储备的肺动脉压的肺动脉压的影响。血管扩张器应力超声心动图在肥厚性心肌病型499中，非阻塞性患者心肌表现与潜在阻塞性肥厚性心肌病的患者患者患者患者患者血小板治疗1000Lifelong心律失常风险Strati心律病学右心室心肌病的发动机：CMR患者血管瘤术治疗患者血管动脉瘤患者血管诊断患者血管心目记录中CMR的纤维化术治疗的事件和影响患有急性心肌炎和保存的喷射分数。磁共振特征跟踪STOPYP1004DETTECTECTECTET通过突出壁运动患者的斑点左心室功能障碍的亚临床左心室功能障碍，没有突出的壁运动异常，P1005aborted突发的心脏死亡患者<50岁的突然性心脏死亡患者只显示心脏磁共振术中的轻度改变，钆和肾外腺纵向应变之间的心脏磁共振成像P1006相关性简单的高血压患者增强

Single-center versus multi-center data sets for molecular prognostic modeling: a simulation study

摘要

著录项

相似文献

相关主题

期刊订阅