...
首页> 外文期刊>Information Technology Journal >Efficient Missing Data Technique for Prediction of Nasopharyngeal Carcinoma Recurrence
【24h】

Efficient Missing Data Technique for Prediction of Nasopharyngeal Carcinoma Recurrence

机译:高效缺失数据技术,用于预测鼻咽癌复发

获取原文

摘要

This study aims to investigate efficient missing data techniques for prediction of nasopharyngeal carcinoma (NPC) recurrence. Initially, clinical data of patients with NPC who received treatment at Ramathibodi hospital, Thailand, were collected. In total, 495 records were employed for the cancer recurrence prediction. Due to the fact that these data contain different missing values, appropriate missing data techniques (MDTs) must be examined. In this study, complete-case analysis, mean imputation, k-nearest neighbor imputation and Expectation Maximization (EM) imputation are mainly focused. The completed data are then used for developing three different predictive models, i.e., single-point model, multiple-point model and sequential neural network . The experimental results showed that EM imputation was superior to the other missing data techniques in which it provided highest predictive performance in all models. The average area under the receiver operating characteristic curve (AUC) of 0.72 could be achieved. The Hosmer and Lemeshow goodness of fit test was used for evaluating goodness of fit of each model. The results confirmed that EM imputation was the best missing data technique. The sequential neural network outperformed the other models. It provided the highest predictive performances in terms of the average AUC (0.73) and the Chi-square statistic (4.30). In addition, survival curves generated from these predictive models were compared with that of the Kaplan-Meier survival curve. The curves based on EM imputation were closest to the Kaplan-Meier model. From the log-rank test, however, these curves were significantly different (p-value < 0.05).
机译:本研究旨在调查高效缺失的数据技术,以预测鼻咽癌(NPC)复发。最初,收集了在泰国Ramathibodi医院接受治疗的NPC患者的临床资料。共有495条记录用于癌症复发预测。由于这些数据包含不同缺失值,必须检查适当的缺少数据技术(MDT)。在本研究中,完全案例分析,平均估算,k最近邻居归纳和期望最大化(EM)估算主要集中在一起。然后,完成的数据用于开发三种不同的预测模型,即单点模型,多点模型和顺序神经网络。实验结果表明,EM估算优于其他缺失的数据技术,其中它在所有模型中提供了最高的预测性能。可以实现0.72的接收器操作特性曲线(AUC)下的平均区域。 Hosmer和Lemeshow的适合度测试用于评估每个模型的适合度。结果证实,EM估算是最缺少的数据技术。顺序神经网络优于其他模型。它为平均AUC(0.73)和Chi-Square统计(4.30)提供了最高的预测性能。此外,将这些预测模型产生的生存曲线与Kaplan-Meier生存曲线的生存曲线进行了比较。基于EM估算的曲线最接近Kaplan-Meier模型。然而,从日志秩测试中,这些曲线显着不同(P值<0.05)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号