首页> 外文会议>Medical Informatics in Europe Conference. >Canonical Correlation Analysis for Data Reduction in Data Mining Applied to Predictive Models for Breast Cancer Recurrence
【24h】

Canonical Correlation Analysis for Data Reduction in Data Mining Applied to Predictive Models for Breast Cancer Recurrence

机译:数据挖掘数据降低的规范相关分析应用于乳腺癌复发预测模型

获取原文

摘要

Data mining methods can be used for extracting specific medical knowledge such as important predictors for recurrence of breast cancer in pertinent data material. However, when there is a huge quantity of variables in the data material it is first necessary to identify and select important variables In this study we present a preprocessing method for selecting important variables in a dataset prior to building a predictive model.In the dataset, data from 5787 female patients were analysed. To cover more predictors and obtain a better assessment of the outcomes, data were retrieved from three different registers the regional breast cancer, tumour markers, and cause of death registers. After retrieving information about selected predictors and outcomes from the different registers, the raw data were cleaned by running different logical rules. Thereafter, domain experts selected predictors assumed to be important regarding recurrence of breast cancer. After that, Canonical Correlation Analysis (CCA) was applied as a dimension reduction technique to preserve the character of the original data. Artificial Neural Network (ANN) was applied to the resulting dataset for two different analyses with the same settings. Performance of the predictive models was confirmed by ten-fold cross validation. The results showed an increase in the accuracy of the prediction and reduction of the mean absolute error.
机译:数据挖掘方法可用于提取特定的医学知识,例如在相关数据材料中乳腺癌复发的重要预测因子。然而,当数据材料中存在大量变量时,首先需要在本研究中识别和选择重要的变量,我们提出了一种预处理方法,用于在构建预测模型之前选择数据集中的重要变量。在数据集中,分析了5787例女性患者的数据。为了涵盖更多的预测因子并获得更好的结果评估结果,从三种不同的寄生虫癌,肿瘤标志物和死亡寄存器原因中检索数据。在检索有关所选预测器和来自不同寄存器的结果的信息后,通过运行不同的逻辑规则来清除原始数据。此后,域专家选择预测因子认为对乳腺癌的复发是重要的。之后,典范相关分析(CCA)被应用为尺寸减少技术以保护原始数据的特征。人工神经网络(ANN)被应用于所得到的数据集,有两个不同的分析,具有相同的设置。通过十倍的交叉验证确认了预测模型的性能。结果表明,预测的准确性和平均绝对误差的降低的增加。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号