首页> 外文学位 >Approaches to reduce and integrate data in structured and high-dimensional regression problems in genomics.
【24h】

Approaches to reduce and integrate data in structured and high-dimensional regression problems in genomics.

机译:减少和整合基因组学中结构化和高维回归问题中的数据的方法。

获取原文
获取原文并翻译 | 示例

摘要

Analysis of high-dimensional data has become increasingly important in several fields of the sciences and engineering. This is particularly true for Genomics with its expanding repertoire of high-throughput technologies. For many regression-like analyses, dimension reduction in the predictor space can be very effective. The most commonly used approaches assume that predictors and samples are similar in nature and can simultaneously participate in the reduction. However, recent high-throughput genomic data is often heterogeneous and structured; for instance, both samples and predictors may be labeled based on their origin and/or information available on their nature or function. Exploiting known structure in samples and predictors when performing dimension reduction can be an avenue for integrating data collected through multiple studies and diverse high-throughput platforms.;To address this challenge, we propose a novel Sufficient Dimension Reduction (SDR) approach that we call structured SDR, and one methodology to pursuit it, structured Ordinary Least Squares (sOLS), that is effective and parsimonious also when, as is the case in many Genomics applications, the number of available samples is relatively small compared to the number of predictors. sOLS combines ideas from existing SDR literature to merge reductions performed within groups of samples and/or predictors. Importantly, it utilizes a novel a version of OLS for grouped predictors that requires far less computation than other recently proposed groupwise SDR procedures, and provides an informal yet effective variable selection tool in these settings. We demonstrate the performance of sOLS by simulation and present a first application to genomic data. In addition, we extend our approach and methodology to be able to tackle regressions with binary or multivariate responses, as well as regressions with correlated observations. These extensions expand the application scope of structured SDR -- e.g. to classification problems and the analysis of spatial data. They, too, are demonstrated through simulations and applications in Genomics and Health Care.;This dissertation holds the promise of providing the Genomics community with an effective data reduction and integration approach, and may also have broad applicability to complex data from other scientific fields.
机译:在科学和工程学的几个领域中,高维数据的分析变得越来越重要。对于基因组学而言,其高通量技术的扩展范围尤其如此。对于许多类似回归的分析,减少预测变量空间中的维数可能非常有效。最常用的方法假设预测变量和样本本质上相似,并且可以同时参与减少。但是,最近的高通量基因组数据通常是异类的和结构化的。例如,样本和预测变量都可以根据其来源和/或有关其性质或功能的可用信息进行标记。执行降维时,利用样本和预测变量中的已知结构可以成为整合通过多项研究和不同的高通量平台收集的数据的途径。 SDR及其追求的一种方法是结构化的最小二乘(sOLS),当许多基因组学应用程序中的可用样本数相对于预测变量数相对较少时,它也是有效且简约的。 sOLS结合了现有SDR文献中的想法,以合并在样本和/或预测变量组中执行的减少量。重要的是,它利用了一种新颖的OLS版本用于分组预测变量,该版本比其他最近提出的分组SDR过程所需的计算量少得多,并且在这些设置中提供了一种非正式但有效的变量选择工具。我们通过仿真演示了sOLS的性能,并首次将其应用于基因组数据。另外,我们扩展了我们的方法和方法,以能够处理具有二元或多元响应的回归,以及具有相关观测值的回归。这些扩展扩展了结构化SDR的应用范围-例如分类问题和空间数据分析。通过在基因组学和卫生保健中的仿真和应用也证明了它们的正确性。

著录项

  • 作者

    Liu, Yang.;

  • 作者单位

    The Pennsylvania State University.;

  • 授予单位 The Pennsylvania State University.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 133 p.
  • 总页数 133
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号