首页> 外文学位 >Bayesian Biclustering on Discrete Data: Variable Selection Methods
【24h】

Bayesian Biclustering on Discrete Data: Variable Selection Methods

机译:离散数据的贝叶斯聚类:变量选择方法

获取原文
获取原文并翻译 | 示例

摘要

Biclustering is a technique for clustering rows and columns of a data matrix simultaneously. Over the past few years, we have seen its applications in biology-related fields, as well as in many data mining projects. As opposed to classical clustering methods, biclustering groups objects that are similar only on a subset of variables. Many biclustering algorithms on continuous data have emerged over the last decade. In this dissertation, we will focus on two Bayesian biclustering algorithms we developed for discrete data, more specifically categorical data and ordinal data.;The international HapMap project has made available the single-nucleotide polymorphism (SNP) data of thousands of individuals across the world. We analyzed the SNPs data with our biclustering algorithm for categorical data and described the similarities between human populations. In contrast to existing methods, our method can locate the SNPs that are specific to subpopulation groups. This can provide insight to the genome-wide association study (GWAS) by eliminating SNPs that are common to different ethic groups. We also identified a number of SNPs that are linked to disease, and this thesis describes their behavior in different subpopulations. The biclustering process can be used as a variable selection step prior to existing population inference procedures.
机译:双簇化是一种用于同时对数据矩阵的行和列进行聚类的技术。在过去的几年中,我们已经看到了它在生物学相关领域以及许多数据挖掘项目中的应用。与经典聚类方法相反,双聚类将仅在变量子集上相似的对象分组。在过去的十年中,出现了许多关于连续数据的双簇算法。在本文中,我们将重点研究针对离散数据(更具体地讲是分类数据和有序数据)开发的两种贝叶斯二聚类算法;国际HapMap项目已提供了全球成千上万个人的单核苷酸多态性(SNP)数据。 。我们使用双聚类算法对SNPs数据进行分类数据分析,并描述了人口之间的相似性。与现有方法相反,我们的方法可以定位特定于亚人群的SNP。通过消除不同种族群体共有的SNP,这可以为全基因组关联研究(GWAS)提供见识。我们还鉴定了许多与疾病相关的SNP,并且本文描述了它们在不同亚群中的行为。在现有的种群推断程序之前,可以将二聚类处理用作变量选择步骤。

著录项

  • 作者

    Guo, Lei.;

  • 作者单位

    Harvard University.;

  • 授予单位 Harvard University.;
  • 学科 Statistics.;Biostatistics.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 143 p.
  • 总页数 143
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号