首页> 外文期刊>Artificial intelligence in medicine >Missing data imputation and synthetic data simulation through modeling graphical probabilistic dependencies between variables (ModGraProDep): An application to breast cancer survival
【24h】

Missing data imputation and synthetic data simulation through modeling graphical probabilistic dependencies between variables (ModGraProDep): An application to breast cancer survival

机译:通过在变量(Modgraprodep)之间的图形概率依赖性建模:乳腺癌生存期缺少数据载体和合成数据仿真

获取原文
获取原文并翻译 | 示例
           

摘要

Background: Two common issues may arise in certain population-based breast cancer (BC) survival studies: I) missing values in a survivals' predictive variable, such as "Stage" at diagnosis, and II) small sample size due to "imbalance class problem" in certain subsets of patients, demanding data modeling/simulation methods.Methods: We present a procedure, ModGraProDep, based on graphical modeling (GM) of a dataset to overcome these two issues. The performance of the models derived from ModGraProDep is compared with a set of frequently used classification and machine learning algorithms (Missing Data Problem) and with oversampling algorithms (Synthetic Data Simulation). For the Missing Data Problem we assessed two scenarios: missing completely at random (MCAR) and missing not at random (MNAR). Two validated BC datasets provided by the cancer registries of Girona and Tarragona (northeastern Spain) were used.Results: In both MCAR and MNAR scenarios all models showed poorer prediction performance compared to three GM models: the saturated one (GM.SAT) and two with penalty factors on the partial likelihood (GM.K1 and GM.TEST). However, GM.SAT predictions could lead to non-reliable conclusions in BC survival analysis. Simulation of a "synthetic" dataset derived from GM.SAT could be the worst strategy, but the use of the remaining GMs models could be better than oversampling.Conclusion: Our results suggest the use of the GM-procedure presented for one-variable imputation/prediction of missing data and for simulating "synthetic" BC survival datasets. The "synthetic" datasets derived from GMs could be also used in clinical applications of cancer survival data such as predictive risk analysis.
机译:背景:在某些群体的乳腺癌(BC)存活研究中可能出现两种常见问题:i)幸存者预测变量中的值缺失,例如“诊断”阶段“,II)由于”不平衡阶级“问题“在某些患者子集中,要求数据建模/仿真方法。方法:我们介绍了一种基于数据集的图形建模(GM)的程序,Modgraprodep,以克服这两个问题。将源自Modgraprodep的模型的性能与一组常用的分类和机器学习算法(缺失数据问题)进行比较,以及用过采样算法(合成数据仿真)。对于缺少的数据问题,我们评估了两种情况:完全缺少随机(MCAR)并不随意丢失(MNAR)。使用了赫罗纳和塔拉戈纳(东北部)的癌症注册表提供的两个经过验证的BC数据集。结果:在MCAR和MNAR情景中,与三种通用型型号相比,所有型号都显示出较差的预测性能:饱和(GM.SAT)和两个在部分可能性(GM.K1和GM.Test)上存在惩罚因素。然而,GM.SAT预测可能导致BC生存分析中的不可依赖结论。仿真从GM.SAT导出的“合成”数据集可能是最糟糕的策略,但使用剩余的GMS模型可能会比过采样更好。结论:我们的结果表明使用用于单个可变估算的GM程序的使用/预测缺失数据和模拟“合成”BC生存数据集。来自GMS的“合成”数据集也可用于癌症存活数据的临床应用,例如预测性风险分析。

著录项

  • 来源
    《Artificial intelligence in medicine》 |2020年第7期|101875.1-101875.11|共11页
  • 作者单位

    Univ Barcelona Secc Estadist Dept Genet Microbiol & Estadist Fac Biol Barcelona 08028 Spain;

    IDIBGI Inst Invest Biomed Girona C Dr Castany S-N Edifici M2 Salt 17190 Spain|Grup Epidemiol Descript Genet & Prevencio Canc Gi Inst Catala Oncol Registre Canc Girona Unitat Epidemiol Pla Director Oncol Girona 17005 Spain;

    IDIBELL Oncol Ave Gran Via 199-203 Lhospitalet De Llobregat 08908 Spain|Univ Barcelona Dept Ciencies Clin Barcelona 08907 Spain;

    MC Mutual Dept Anal & Planificac Recursos Sanitarios Barcelona 08037 Spain|Tech Univ Catalonia Dept Stat Barcelona 08028 Spain|Univ Alicante Publ Hlth Res Grp Alicante 03690 Spain;

    Univ Barcelona Secc Estadist Dept Genet Microbiol & Estadist Fac Biol Barcelona 08028 Spain;

    Hosp Univ St Joan Reus Registre Canc Tarragona Serv Epidemiol & Prevencio Canc IISPV Reus Spain;

    IDIBELL Oncol Ave Gran Via 199-203 Lhospitalet De Llobregat 08908 Spain;

    Univ Barcelona Secc Estadist Dept Genet Microbiol & Estadist Fac Biol Barcelona 08028 Spain;

    Univ Girona UdG Sch Med Girona Spain|Ctr Invest Biomed Red Epidemiol & Salud Publ CIBE Madrid Spain|IDIBGI Inst Invest Biomed Girona C Dr Castany S-N Edifici M2 Salt 17190 Spain|Grup Epidemiol Descript Genet & Prevencio Canc Gi Inst Catala Oncol Registre Canc Girona Unitat Epidemiol Pla Director Oncol Girona 17005 Spain;

    Grup Epidemiol Descript Genet & Prevencio Canc Gi Inst Catala Oncol Registre Canc Girona Unitat Epidemiol Pla Director Oncol Girona 17005 Spain;

    Hosp Univ St Joan Reus Registre Canc Tarragona Serv Epidemiol & Prevencio Canc IISPV Reus Spain;

    IDIBELL Oncol Ave Gran Via 199-203 Lhospitalet De Llobregat 08908 Spain;

    Hosp Univ St Joan Reus Registre Canc Tarragona Serv Epidemiol & Prevencio Canc IISPV Reus Spain;

    Hosp Univ Girona Doctor Josep Trueta Inst Catala Oncol Serv Oncol Med Girona 17005 Spain|Grup Epidemiol Descript Genet & Prevencio Canc Gi Inst Catala Oncol Registre Canc Girona Unitat Epidemiol Pla Director Oncol Girona 17005 Spain;

    IDIBELL Oncol Ave Gran Via 199-203 Lhospitalet De Llobregat 08908 Spain|Univ Barcelona Dept Ciencies Clin Barcelona 08907 Spain;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Breast cancer; Survival; Graphical models; Missing data; Oversampling; Simulation;

    机译:乳腺癌;生存;图形模型;缺少数据;过采样;模拟;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号