首页> 中文期刊> 《管理科学》 >改进随机森林的集成分类方法预测结直肠癌存活性

改进随机森林的集成分类方法预测结直肠癌存活性

         

摘要

癌症是人类死亡的主要原因之一,许多国家在癌症方面的支出占医疗总支出的很大比例.癌症存活性预测作为癌症预后的一项重要工作,可以辅助医生做出更精准的诊疗决策,进而降低癌症治疗成本.近年来,基于数据驱动的癌症存活性预测方法逐渐得到应用,而预测的准确性是评价预测方法性能的主要指标,因此提高癌症存活性预测方法的准确性一直是一个活跃的研究领域.结直肠癌是一种具有高发病率和高死亡率的癌症,为了提高结直肠癌存活性预测的准确性,利用遗传算法对随机森林进行改进,提出基于GA-RF的集成分类方法.该方法通过遗传算法对随机森林中的决策树实行进化搜索,以提高集成分类准确率为目标选出决策树的满意集成.实验分别使用基于GA-RF的集成分类方法、决策树和参数优化的随机森林训练预测模型预测结直肠癌患者的存活性,利用SEER数据库的结直肠癌数据集对3种方法分别进行10折交叉验证,然后用准确性、敏感性和特异性3个指标对它们进行评价.实验结果显示,基于GA-RF的集成分类方法的预测精度最高(88.2%),参数优化的随机森林的预测精度次之(86.4%),但集成复杂度远高于基于GA-RF的集成分类方法,决策树的预测精度最差(74.2%),而基于GA-RF的集成分类方法还表现出了最好的泛化性能.该集成分类方法对随机森林进行了有效的改进,能以更高的运算效率和更好的准确性预测结直肠癌存活性,可以为结直肠癌的预后提供决策参考,弥补经验预测的不足,该方法的提出对节约医疗资源、降低医疗成本、提高患者满意度具有实际意义.%Cancer is one of the major causes of death for human and accounts for a large proportion of the costs of healthcare in many countries.The prediction of cancer survivability is an important task for cancer prognosis and has been a challenging research problem for many researchers,which can help doctors to make more accurate diagnostic and treatment decisions and lower treatment costs.In recent years,data-driven methods for cancer survivability prediction have been gradually put into application,yet improving the accuracy of cancer survivability prediction methods has always been an active area of research as the accuracy of prediction is the main index to evaluate the performance of prediction methods.This paper focuses on colorectal cancer which has both high incidence and high mortality.In order to make survivability prediction of colorectal cancer more accuracy,an ensemble classification method based on GA-RF is proposed.This method is the outcome of using genetic algorithm(GA for short) to make improvements to the random forest(RF for short).Genetic algorithm is used to search for parts of the decision trees in random forest aiming at getting better accuracy of ensemble classification.The method proposed along with decision tree method and the random forest method after parameter optimization are used to develop prediction models to predict the survivability of patients with colorectal cancer.Using the colorectal cancer data set of the SEER database,experiments are carried out with three methods which are tested by 10-fold cross-validation for performance comparison purposes,and then accuracy,sensitivity and specificity are used to evaluate the three methods.The experimental results indicated that the ensemble classification method based on GA-RF had the prediction accuracy of 88.2%,higher than that of the random forest after parameter optimization and decision tree.And random forest which came out to be the second also had a high accuracy of 86.4%,but the complexity of ensemble was much more than that of the ensemble classification method based on GA-RF,and decision tree came out to be the worst of the three with 74.2% accuracy.Besides,the ensemble classification method based on GA-RF showed the best generalization ability.The ensemble classification method proposed makes an effective improvement on random forest,which can predict survivability of colorectal cancer with higher efficiency and accuracy,provide reference for decision-making of colorectal cancer prognosis,make up for the shortage of survivability prediction based on experience,and has practical significance to saving medical resources,reducing medical costs and improving patient satisfaction.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号