首页> 外文学位 >Learning with Kernels and Graphs to Understand Cancer DNA Copy Number Variations.
【24h】

Learning with Kernels and Graphs to Understand Cancer DNA Copy Number Variations.

机译:使用核和图进行学习以了解癌症DNA拷贝数的变化。

获取原文
获取原文并翻译 | 示例

摘要

DNA copy number variations (CNVs) are biological indicators that characterize cancer genomes. Predicting the prognosis of cancer from CNVs and identifying cancer-causing CNVs is a challenging problem due to the high dimensionality of the CNV features and the heterogeneity of patients. In this thesis, our objective is to build robust predictive models based on CNV data using machine learning techniques for accurate cancer diagnosis and prognosis, as well as for the identification of cancer-causing CNVs.;We proposed several machine learning models towards these objectives: 1. We developed a hypergraph-based semi-supervised learning algorithm HyperPrior for cancer outcome prediction from CNV data and gene expression data. It incorporates biological prior knowledge such as the spacial information in arrayCGH datasets to get consistent weighting on correlated genomic features, thus to improve the accuracy of the model in sample classification. In addition, the algorithm can also be used for biomarker or cancer-causing CNV detection; 2. We developed an alignment-based kernel method for integrating CNV data from multiple platforms. By integrating datasets generated from different probe sets, the new kernel could improve the cancer outcome prediction by the SVM classifier. Furthermore, we also designed a multiple alignment approach based on our alignment kernel to identify shared CNVs among cancer samples, which served as candidates of cancer-causing CNVs for further analysis; 3. We proposed an algorithm to learn a low-rank graph to represent the similarities between data points. This low-rank graph could capture the global cluster structures and improve the performance of label propagation. The whole approach can be applied to arrayCGH datasets as well as other types of datasets for better sample classification results; 4. We proposed a latent feature model that couples sparse sample group selection with fused lasso. Clinical information was used to define the group structure on patient samples. By sparse group selection, the model was able to identify group-specific CNVs instead of common CNVs from arrayCGH datasets.;We used both simulations and several publicly available genomic datasets to evaluate our models. The results suggest that these models are promising in achieving better cancer prognosis prediction and identification of cancer-causing CNVs.
机译:DNA拷贝数变异(CNV)是表征癌症基因组的生物学指标。由于CNV特征的高维度和患者的异质性,从CNV预测癌症的预后并确定引起癌症的CNV是一个具有挑战性的问题。在本文中,我们的目标是使用机器学习技术基于CNV数据构建鲁棒的预测模型,以进行准确的癌症诊断和预后以及识别致癌的CNV .;针对这些目标,我们提出了几种机器学习模型: 1.我们开发了基于超图的半监督学习算法HyperPrior,用于根据CNV数据和基因表达数据预测癌症结果。它在arrayCGH数据集中结合了生物学先验知识(例如空间信息),以在相关基因组特征上获得一致的权重,从而提高了模型在样品分类中的准确性。此外,该算法还可用于生物标记或致癌的CNV检测; 2.我们开发了一种基于路线的内核方法,用于集成来自多个平台的CNV数据。通过整合从不同探针集生成的数据集,新内核可以改善SVM分类器对癌症结果的预测。此外,我们还基于比对内核设计了多重比对方法,以鉴定癌症样本中共享的CNV,这些样本可作为致癌CNV的候选对象,以进行进一步分析; 3.我们提出了一种学习低秩图来表示数据点之间相似度的算法。此低排名图可以捕获全局群集结构并提高标签传播的性能。整个方法可以应用于arrayCGH数据集以及其他类型的数据集,以获得更好的样本分类结果。 4.我们提出了一个潜在特征模型,将稀疏样本组选择与融合套索结合在一起。临床信息用于定义患者样品的组结构。通过稀疏的组选择,该模型能够从arrayCGH数据集中识别特定组的CNV,而不是普通的CNV 。;我们同时使用了模拟和一些公开的基因组数据集来评估我们的模型。结果表明,这些模型在实现更好的癌症预后预测和鉴定致癌CNV方面很有希望。

著录项

  • 作者

    Tian, Ze.;

  • 作者单位

    University of Minnesota.;

  • 授予单位 University of Minnesota.;
  • 学科 Computer science.;Oncology.;Genetics.;Bioinformatics.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 133 p.
  • 总页数 133
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号