...
首页> 外文期刊>Journal of Big Data >Lifelong Machine Learning and root cause analysis for large-scale cancer patient data
【24h】

Lifelong Machine Learning and root cause analysis for large-scale cancer patient data

机译:终生机器学习和大规模癌症患者数据的根本原因分析

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Abstract IntroductionThis paper presents a lifelong learning framework which constantly adapts with changing data patterns over time through incremental learning approach. In many big data systems, iterative re-training high dimensional data from scratch is computationally infeasible since constant data stream ingestion on top of a historical data pool increases the training time exponentially. Therefore, the need arises on how to retain past learning and fast update the model incrementally based on the new data. Also, the current machine learning approaches do the model prediction without providing a comprehensive root cause analysis. To resolve these limitations, our framework lays foundations on an ensemble process between stream data with historical batch data for an incremental lifelong learning (LML) model.Case descriptionA cancer patient’s pathological tests like blood, DNA, urine or tissue analysis provide a unique signature based on the DNA combinations. Our analysis allows personalized and targeted medications and achieves a therapeutic response. Model is evaluated through data from The National Cancer Institute’s Genomic Data Commons unified data repository. The aim is to prescribe personalized medicine based on the thousands of genotype and phenotype parameters for each patient.Discussion and evaluationThe model uses a dimension reduction method to reduce training time at an online sliding window setting. We identify the Gleason score as a determining factor for cancer possibility and substantiate our claim through Lilliefors and Kolmogorov–Smirnov test. We present clustering and Random Decision Forest results. The model’s prediction accuracy is compared with standard machine learning algorithms for numeric and categorical fields.ConclusionWe propose an ensemble framework of stream and batch data for incremental lifelong learning. The framework successively applies first streaming clustering technique and then Random Decision Forest Regressor/Classifier to isolate anomalous patient data and provides reasoning through root cause analysis by feature correlations with an aim to improve the overall survival rate. While the stream clustering technique creates groups of patient profiles, RDF further drills down into each group for comparison and reasoning for useful actionable insights. The proposed MALA architecture retains the past learned knowledge and transfer to future learning and iteratively becomes more knowledgeable over time.
机译:摘要简介本文提出了一种终身学习框架,该框架通过增量学习方法不断适应不断变化的数据模式。在许多大数据系统中,从头开始迭代地重新训练高维数据在计算上是不可行的,因为在历史数据池顶部不断进行数据流摄取会成倍地增加训练时间。因此,需要如何保留过去的学习并基于新数据以增量方式快速更新模型。同样,当前的机器学习方法在不提供全面的根本原因分析的情况下进行模型预测。为了解决这些局限性,我们的框架基于流数据与历史批处理数据之间的整体过程,以建立终身学习(LML)增量模型。案例描述癌症患者的血液,DNA,尿液或组织分析等病理学检查提供了独特的特征在DNA组合上。我们的分析允许个性化和针对性的药物,并获得治疗效果。该模型是根据美国国家癌症研究所(National Cancer Institute)的Genomic Data Commons统一数据库中的数据进行评估的。目的是根据每位患者的数千个基因型和表型参数来开出个性化药物。讨论和评估该模型使用降维方法来减少在线滑动窗口设置下的训练时间。我们将格里森评分确定为癌症可能性的决定因素,并通过Lilliefors和Kolmogorov-Smirnov检验证实了我们的主张。我们提出聚类和随机决策森林结果。该模型的预测精度与用于数值和分类领域的标准机器学习算法进行了比较。结论我们提出了一种流和批处理数据的集成框架,以进行终生增量学习。该框架先后应用流聚类技术和随机决策森林回归器/分类器来隔离异常患者数据,并通过特征相关性通过根本原因分析提供推理,以提高整体生存率。在流聚类技术创建患者档案组的同时,RDF进一步深入到每个组中,以进行比较和推理,以获得有用的可行见解。提议的MALA体系结构保留了过去学习的知识,并转移到未来的学习中,并且随着时间的推移,迭代的知识越来越丰富。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号