Controlling for hidden factors in high dimensional eQTL studies.

机译：在高维eQTL研究中控制隐藏因素。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Finding genetic variants that regulate gene expression now plays a central role in the analysis of mechanism in biological systems. This will also increasingly be the case as large amounts of gene expression and genetic marker data are generated by next-generation sequencing technologies. While the unprecedented scale of these data is providing the opportunity for scientists to answer basic questions about biological systems, the properties of these data raise analysis challenges, particularly in terms of covariate modeling. For example, expression levels of thousands of genes are usually measured in batches and different batches may be measured under different conditions, which creates the well known batch effect. Besides this artificially created factor that can affect the quality of the measurement, expression data often reflect environmental regulators that change the gene expression levels, such as smoking, drug usage etc. These sources of confounding need to be addressed either before or during analysis of data.;In this thesis, I address the analysis issues raised by a particular type of confounding in high-dimensional data: hidden factor effects. Hidden factors are defined as factors that contribute to variation in a large number of measured variables where there is no direct information concerning the factors in the data. It is critical to correct for the hidden factors because if ignored, they can lead to either high false positive rates or reduced power. To tackle this issue, I propose to use a statistical model that combines multivariate ridge regression and factor analysis to infer both the fixed effects and the hidden confounding. The method is unique in the sense that it employs the multivariate regression components to infer the associations between the response Y and the covariate X, while it maintains efficiency by sharing the same data reduction property with the factor analysis model. Compared to other models that address the same issue, this model can successfully partition the covariance structure of the hidden factors, which dramatically improves the power and the accuracy of detecting the real associations between X and Y. I also used the model to address the hidden factors issues in the analysis of data on gene expression levels measured in the airway of the lung in a sample of people, in the context of a genome association study, referred to as an expression Quantitative Trait Loci (eQTL) analysis. I show that the method successfully eliminates the false positives caused by spurious structures (hidden factors) and greatly improves the power to detect true genetic determinants (the eQTL) that regulate gene expression in the lung airway. I also apply the method to a challenging Genotype-Environment Interaction (GEI) analysis, where GEI effects are defined as the dependence of genotype-phenotype relationships on environmental factors. I show that despite the small sample size and the highly complicated data structure, with my method, I can identify a large number of interesting GEI associations, many have been verified independently by other studies to be highly relevant genes to lung disease and lung functions. These GEI associations contain more information than a typical eQTL because they help to identify genetic regulators that show different behavior under different environmental pressures, which serve as an interesting set of gene candidates for clinical scientists.

机译：寻找调节基因表达的遗传变异现在在生物系统机理分析中起着核心作用。随着下一代测序技术产生大量的基因表达和遗传标记数据，情况也将越来越多。尽管这些数据的空前规模为科学家提供了回答有关生物系统的基本问题的机会，但这些数据的性质提出了分析挑战，尤其是在协变量建模方面。例如，通常成批地测量数千个基因的表达水平，并且可以在不同条件下测量不同的批次，这产生了众所周知的批次效应。除了这种可能会影响测量质量的人为因素外，表达数据还经常反映出环境调节剂，它们会改变基因表达水平，例如吸烟，吸毒等。这些混杂的原因需要在数据分析之前或期间加以解决。在本文中，我将解决由高维数据中的一种特殊类型的混杂所引起的分析问题：隐藏因素影响。隐藏因素定义为在没有直接有关数据因素的直接信息的情况下，导致大量测量变量发生变化的因素。纠正隐藏因素非常重要，因为如果忽略这些隐藏因素，它们可能导致高误报率或降低功率。为了解决这个问题，我建议使用一个统计模型，该模型结合了多元岭回归和因子分析来推断固定效应和隐含的混淆。该方法的独特之处在于，它使用多元回归分量来推断响应Y和协变量X之间的关联，同时通过与因子分析模型共享相同的数据约简属性来保持效率。与解决相同问题的其他模型相比，该模型可以成功地划分隐藏因素的协方差结构，从而显着提高检测X和Y之间真实关联的能力和准确性。我还使用该模型来解决隐藏问题在基因组关联研究的背景下，在样本中的肺气道中测得的基因表达水平数据的分析中，一些因素会引起影响，这被称为表达定量性状位点（eQTL）分析。我证明了该方法成功消除了由假结构（隐藏因素）引起的误报，并大大提高了检测调节肺气道基因表达的真正遗传决定因素（eQTL）的能力。我还将这种方法应用于具有挑战性的基因型-环境相互作用（GEI）分析，其中GEI效应定义为基因型与表型关系对环境因素的依赖性。我表明，尽管样本量小且数据结构高度复杂，但通过我的方法，我仍可以识别出许多有趣的GEI关联，许多其他研究已独立验证它们与肺疾病和肺功能高度相关。这些GEI关联比典型的eQTL包含更多的信息，因为它们有助于识别在不同环境压力下表现出不同行为的遗传调节剂，这是临床科学家感兴趣的一组基因候选物。

著录项

作者
Gao, Chuan.;
展开▼
作者单位

Cornell University.;

展开▼
授予单位 Cornell University.;
学科 Biology Biostatistics.;Biology Bioinformatics.;Statistics.
学位 Ph.D.
年度 2012
页码 106 p.
总页数 106
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. HEFT: eQTL analysis of many thousands of expressed genes while simultaneously controlling for hidden factors [J] . Gao Chuan, Tignor Nicole L., Salit Jacqueline, Bioinformatics . 2014,第3期

机译：HEFT：对数千种表达基因进行eQTL分析，同时控制隐藏因素
2. Robust Bayesian FDR Control Using Bayes Factors, with Applications to Multi-tissue eQTL Discovery [J] . Xiaoquan Wen Statistics in Biosciences . 2017,第1期

机译：强大的贝叶斯FDR控制使用贝叶斯因子，应用于多组织EQTL发现
3. APS -70th Annual Meeting of the APS Division of Fluid Dynamics- Event - Discovering Hidden Controlling Parameters using Data Analytics and Dimensional Analysis [J] . Zachary del Rosario, Minyong Lee, Gianluca Iaccarino Bulletin of the American Physical Society . 2017,第14期

机译：APS-流体动力学APS部门第70届年会-事件-使用数据分析和维度分析发现隐藏的控制参数
4. Control of electrochemical dimensional processing on the basis of synergy of controllable factors [C] . A R Zakirova, Z B Sadykov International Scientific-Technical Conference on Innovative Engineering Technologies, Equipment and Materials . 2018

机译：基于可控因子协同作用的电化学尺寸加工控制
5. Two-photon fabricated scaffolds for controlled three-dimensional cell migration studies. [D] . Tayalia, Prakriti. 2009

机译：两光子制造的支架，用于受控的三维细胞迁移研究。
6. HEFT: eQTL analysis of many thousands of expressed genes while simultaneously controlling for hidden factors [O] . Chuan Gao, Nicole L. Tignor, Jacqueline Salit, -1

机译：HEFT：对数千种表达基因进行eQTL分析同时控制隐藏因素
7. Controlling For Hidden Factors In High Dimensional Eqtl Studies [O] . Gao Chuan 2012

机译：高维方程研究中隐藏因素的控制

Controlling for hidden factors in high dimensional eQTL studies.

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅