首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >Analysis of Meta-Learning Approaches for TCGA Pan-cancer Datasets
【24h】

Analysis of Meta-Learning Approaches for TCGA Pan-cancer Datasets

机译:TCGA泛癌数据集的元学习方法分析

获取原文

摘要

Cancer has been characterized as a heterogeneous disease, and the classification of cancer subtypes has become a necessity in cancer research, as it can facilitate the subsequent clinical management of patients and provide clinical decision support for clinicians. With the advance of machine learning in the last decade, many researchers employ machine learning to tackle the cancer classification problem. Importantly, traditional machine learning algorithms require a large amount of annotated data for model training. However, collection of large amounts of annotated data is time-consuming and expensive and may not be realistic in real-world activities. Facing data scarcity, meta-learning is proposed to tackle this problem. Meta-learning utilizes prior knowledge learned from related tasks and generalizes to new tasks of limited supervised experience, and it has been applied in many fields to tackle scarce annotated data problem, such as few-shot image classification, drug discovery, etc. As data scarcity is common in cancer research and diagnosis studies, and there are only few previous studies that classify cancers based on limited annotated data. We explore the meta-learning algorithm (MAML) to tackle the scenario where only limited annotated data are available. In this work, our objective is to comprehensively compare MAML among few-shot learning methods (matching network and prototypical network) and traditional machine learning methods (random forest and K-nearest neighbor). Experimental results on The Cancer Genome Atlas (TCGA) cancer patient data demonstrates the effectiveness and superiority of MAML over other methods, including its ability to outperform the other methods using 4.5-fold fewer features.
机译:癌症被描述为异质疾病,癌症亚型的分类已成为癌症研究的必要性,因为它可以促进患者的后续临床管理,为临床医生提供临床决策支持。随着过去十年的机器学习的推进,许多研究人员采用机器学习来解决癌症分类问题。重要的是,传统的机器学习算法需要大量注释数据进行模型培训。然而,集合大量注释数据是耗时和昂贵的,并且在现实世界活动中可能无法逼真。面对数据稀缺,建议元学习以解决这个问题。元学习利用相关任务中学到的先验知识并推广有限的受监督经验的新任务,并且它已应用于许多领域来解决稀缺的注释数据问题,例如少量图像分类,药物发现等。作为数据稀缺性在癌症研究和诊断研究中是常见的,并且只有少数以前的研究基于有限的注释数据对癌症进行分类。我们探索元学习算法(MAML)来解决仅有有限的注释数据的场景。在这项工作中,我们的目标是全面比较MAML,几次学习方法(匹配网络和原型网络)和传统机器学习方法(随机林和K最近邻居)。癌症基因组Atlas(TCGA)癌症患者数据的实验结果证明了MAML在其他方法上的有效性和优越性,包括其优于使用4.5倍的特征的其他方法的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号