首页> 外文OA文献 >Adapting Data Adaptive Methods for Small, but High Dimensional Omic Data: Applications to GWAS/EWAS and More
【2h】

Adapting Data Adaptive Methods for Small, but High Dimensional Omic Data: Applications to GWAS/EWAS and More

机译:针对小型但高维的Omic数据调整数据自适应方法:在GWAS / EWAS中的应用等

摘要

Exploratory analysis of high dimensional u22omicsu22 data has received much attention since the explosion of high-throughput technology allows simultaneous screening of tens of thousands of characteristics (genomics, metabolomics, proteomics, adducts, etc., etc.). Part of this trend has been an increase in the dimension of exposure data in studies of environmental exposure and associated biomarkers. Though some of the general approaches, such as GWAS, are transferable, what has received less focus is 1) how to derive estimation of independent associations in the context of many competing causes, without resorting to a misspecified model, and 2) how to derive accurate small-sample inference when data adaptive techniques are used in this context. This paper focuses on semi-parametric variable importance analysis of high dimensional data sets of modest sample size (e.g., gene expression, mRNA, etc). Though the methodology we propose is generally applicable to similar situations, we present the method in the context of a study of miRNA expression for an environmental exposure. Specifically, the analysis is faced with not just a large number of comparisons, but also trying to tease out of association of the expression of miRNA with an exposure apart from confounds such as age, race, smoking conditions, BMI, etc. Our goal is to propose a method that is reasonably robust in small samples, but does not rely on misspecified (arbitrary) parametric assumptions, and thus will be based on data-adaptive methods. The methodology proposed is we believe a powerful combination of existing semi-parametric statistical methods and theory, as well as a simple framework for use of commonly used empirical Bayes approaches to aid in small sample inference. Specifically, We propose using targeted maximum likelihood estimation (TMLE) for estimating variable importance measures along with a general adaptation of the commonly used Limma approach, which relies on specification of the so-called influence curve of the proposed estimator. The result is a machine-based approach that can estimate independent associations in high dimensional data, but protects against the unreliability of small-sample inference that can result when using data adaptive estimation in relatively small samples.
机译:对高维 u22omics u22数据的探索性分析备受关注,因为高通量技术的爆炸性发展使得可以同时筛选成千上万的特征(基因组,代谢组学,蛋白质组学,加合物等)。这种趋势的部分原因是在环境暴露和相关生物标志物的研究中,暴露数据的规模有所增加。尽管某些通用方法(例如GWAS)是可以转移的,但受到较少关注的是:1)如何在许多竞争原因的背景下推导独立关联的估计,而无需诉诸错误指定的模型,以及2)如何推导在这种情况下使用数据自适应技术时,可以进行精确的小样本推断。本文着重于对样本量适中的高维数据集(例如基因表达,mRNA等)进行半参数变量重要性分析。尽管我们提出的方法通常适用于类似情况,但我们在环境暴露的miRNA表达研究中介绍了该方法。具体而言,该分析不仅要面对大量的比较,而且还要尝试摆脱年龄,种族,吸烟条件,BMI等混杂因素,使miRNA的表达与暴露无关。我们的目标是提出一种在小样本中相当健壮但不依赖于错误指定的(任意)参数假设的方法,因此将基于数据自适应方法。我们认为,所提出的方法论是现有半参数统计方法和理论的强大组合,以及使用常用经验贝叶斯方法来帮助进行小样本推断的简单框架。具体来说,我们建议使用目标最大似然估计(TMLE)来估计变量重要性度量,同时对常用的Limma方法进行一般调整,这取决于拟议估计器的所谓影响曲线的规范。结果是一种基于机器的方法,该方法可以估计高维数据中的独立关联,但可以避免在相对较小的样本中使用数据自适应估计时可能导致的小样本推理的不可靠性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号