首页> 外文学位 >Partition models for variable selection and interaction detection.
【24h】

Partition models for variable selection and interaction detection.

机译:用于变量选择和交互检测的分区模型。

获取原文
获取原文并翻译 | 示例

摘要

Variable selection methods play important roles in modeling high-dimensional data and are key to data-driven scientific discoveries. In this thesis, we consider the problem of variable selection with interaction detection. Instead of building a predictive model of the response given combinations of predictors, we start by modeling the conditional distribution of predictors given partitions based on responses. We use this inverse modeling perspective as motivation to propose a stepwise procedure for effectively detecting interaction with few assumptions on parametric form. The proposed procedure is able to detect pairwise interactions among p predictors with a computational time of O(p) instead of O(p2) under moderate conditions. We establish consistency of the proposed procedure in variable selection under a diverging number of predictors and sample size. We demonstrate its excellent empirical performance in comparison with some existing methods through simulation studies as well as real data examples.;Next, we combine the forward and inverse modeling perspectives under the Bayesian framework to detect pleiotropic and epistatic effects in effects in expression quantitative loci (eQTLs) studies. We augment the Bayesian partition model proposed by Zhang et al. (2010) to capture complex dependence structure among gene expression and genetic markers. In particular, we propose a sequential partition prior to model the asymmetric roles played by the response and the predictors, and we develop an efficient dynamic programming algorithm for sampling latent individual partitions. The augmented partition model significantly improves the power in detecting eQTLs compared to previous methods in both simulations and real data examples pertaining to yeast.;Finally, we study the application of Bayesian partition models in the unsupervised learning of transcription factor (TF) families based on protein binding microarray (PBM). The problem of TF subclass identification can be viewed as the clustering of TFs with variable selection on their binding DNA sequences. Our model provides simultaneous identification of TF families and their shared sequence preferences, as well as DNA sequences bound preferentially by individual members of TF families. Our analysis may aid in deciphering cis regulatory codes and determinants of protein-DNA binding specificity.
机译:变量选择方法在建模高维数据中扮演重要角色,并且是数据驱动的科学发现的关键。本文考虑了交互检测中的变量选择问题。与其建立给定预测变量组合的响应的预测模型,不如对给定基于响应的分区的预测变量的条件分布建模。我们使用这种逆建模的观点作为动机,提出了一个逐步过程,以有效地检测参数形式上的少量假设而进行的交互。所提出的过程能够在中等条件下以O(p)而不是O(p2)的计算时间检测p个预测变量之间的成对交互。我们在变量数量和样本数量不同的情况下,建立了变量选择中所提出程序的一致性。我们通过仿真研究和实际数据示例,与一些现有方法进行了比较,证明了其出色的经验性能。;接下来,我们在贝叶斯框架下结合正向和反向建模观点,以检测表达定量基因座效应中的多效性和上位性效应( eQTL)研究。我们扩充了张等人提出的贝叶斯划分模型。 (2010)捕获基因表达和遗传标记之间的复杂依赖性结构。特别地,我们在建模响应和预测变量所扮演的不对称角色之前提出了一个顺序分区,并且我们开发了一种有效的动态编程算法来采样潜在的单个分区。在模拟和涉及酵母的真实数据示例中,与以前的方法相比,增强分区模型显着提高了检测eQTL的能力。最后,我们研究了贝叶斯分区模型在基于转录因子(TF)家族的无监督学习中的应用。蛋白结合微阵列(PBM)。 TF亚类识别的问题可以看作是TF的簇,在其结合DNA序列上具有可变选择。我们的模型可以同时识别TF家族及其共享的序列偏好,以及由TF家族的各个成员优先结合的DNA序列。我们的分析可能有助于破译顺式调控密码和决定蛋白质-DNA结合特异性的决定因素。

著录项

  • 作者

    Jiang, Bo.;

  • 作者单位

    Harvard University.;

  • 授予单位 Harvard University.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 190 p.
  • 总页数 190
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号