...
首页> 外文期刊>ACM transactions on knowledge discovery from data >Kernelized Information-Theoretic Metric Learning for Cancer Diagnosis Using High-Dimensional Molecular Profiling Data
【24h】

Kernelized Information-Theoretic Metric Learning for Cancer Diagnosis Using High-Dimensional Molecular Profiling Data

机译:使用高维分子谱数据对癌症诊断进行核信息理论度量学习

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

With the advancement of genome-wide monitoring technologies, molecular expression data have become widely used for diagnosing cancer through tumor or blood samples. When mining molecular signature data, the process of comparing samples through an adaptive distance function is fundamental but difficult, as such datasets are normally heterogeneous and high dimensional. In this article, we present kernelized information-theoretic metric learning (KITML) algorithms that optimize a distance function to tackle the cancer diagnosis problem and scale to high dimensionality. By learning a nonlinear transformation in the input space implicitly through kernelization, KITML permits efficient optimization, low storage, and improved learning of distance metric. We propose two novel applications of KITML for diagnosing cancer using high-dimensional molecular profiling data: (1) for sample-level cancer diagnosis, the learned metric is used to improve the performance of k-nearest neighbor classification; and (2) for estimating the severity level or stage of a group of samples, we propose a novel set-based ranking approach to extend KITML. For the sample-level cancer classification task, we have evaluated on 14 cancer gene microarray datasets and compared with eight other state-of-the-art approaches. The results show that our approach achieves the best overall performance for the task of molecular-expression-driven cancer sample diagnosis. For the group-level cancer stage estimation, we test the proposed set-KITML approach using three multi-stage cancer microarray datasets, and correctly estimated the stages of sample groups for all three studies.
机译:随着全基因组监测技术的进步,分子表达数据已被广泛用于通过肿瘤或血液样本诊断癌症。在挖掘分子标记数据时,通过自适应距离函数比较样本的过程是基本但困难的,因为此类数据集通常是异构的且具有高维。在本文中,我们介绍了核信息理论度量学习(KITML)算法,该算法优化了距离函数以解决癌症诊断问题并扩展到高维度。通过内核化隐式地学习输入空间中的非线性变换,KITML允许高效的优化,低存储量和改进的距离度量学习。我们提出了使用高维分子谱数据在KITML诊断癌症中的两个新颖应用:(1)对于样本级癌症诊断,学习的度量用于提高k近邻分类的性能; (2)为了估计一组样本的严重程度或阶段,我们提出了一种新颖的基于集合的排序方法来扩展KITML。对于样本级癌症分类任务,我们评估了14种癌症基因微阵列数据集,并与其他八种最新方法进行了比较。结果表明,对于分子表达驱动的癌症样品诊断,我们的方法获得了最佳的整体性能。对于组水平的癌症阶段估计,我们使用三个多阶段癌症微阵列数据集测试了建议的set-KITML方法,并正确估计了所有三个研究的样本组的阶段。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号