首页> 外文期刊>The Annals of applied statistics >CAPTURING HETEROGENEITY OF COVARIATE EFFECTS IN HIDDEN SUBPOPULATIONS IN THE PRESENCE OF CENSORING AND LARGE NUMBER OF COVARIATES
【24h】

CAPTURING HETEROGENEITY OF COVARIATE EFFECTS IN HIDDEN SUBPOPULATIONS IN THE PRESENCE OF CENSORING AND LARGE NUMBER OF COVARIATES

机译:在污染和大量协变量存在下隐藏的群体中捕获相变群的协变量的异质性

获取原文
获取原文并翻译 | 示例
           

摘要

The advent of modern technology has led to a surge of high-dimensional data in biology and health sciences such as genomics, epigenomics and medicine. The high-grade serous ovarian cancer (HGS-OvCa) data reported by The Cancer Genome Atlas (TCGA) Research Network is one example. The TCGA and other research groups have analyzed several aspects of these data. Here we study the relationship between Disease Free Time (DFT) after surgery among ovarian cancer patients and their DNA methylation profiles of genomic features. Such studies pose additional challenges beyond the typical big data problem due to population substructure and censoring. Despite the availability of several methods for analyzing time-to-event data with a large number of covariates but a small sample size, there is no method available to date that accommodates the additional feature of heterogeneity. To this end, we propose a regularized framework based on the finite mixture of accelerated failure time model to capture intangible heterogeneity due to population substructure and to account for censoring simultaneously. We study the properties of the proposed framework both theoretically and numerically. Our data analysis indicates the existence of heterogeneity in the HGS-OvCa data, with one component of the mixture capturing a more aggressive form of the disease, and the second component capturing a less aggressive form. In particular, the second component portrays a significant positive relationship between methylation and DFT for BRCA1. By further unearthing the negative relationship between expression and methylation for this gene, one may provide a biologically reasonable explanation that sheds light on the relationship between DNA methylation, gene expression and mutation.
机译:现代技术的出现导致生物学和健康科学中的高维数据激增,如基因组学,表观囊组科和医学。癌症基因组Atlas(TCGA)研究网络报告的高级浆液卵巢癌(HGS-OVCA)数据是一个例子。 TCGA和其他研究组分析了这些数据的几个方面。在这里,我们研究卵巢癌患者手术后疾病自由时间(DFT)的关系及其基因组特征的DNA甲基化谱系。由于人口子结构和审查,这些研究造成了超越典型的大数据问题的额外挑战。尽管有几种方法来分析具有大量协变量但样本大小的次次发生时间的方法,但是没有可用的方法可用于满足异质性的附加功能。为此,我们提出了一种基于加速故障时间模型的有限混合物的正则化框架,以捕获由于人口子结构引起的无形异质性,并同时审查审查。我们在理论上和数值上研究所提出的框架的性质。我们的数据分析表明HGS-OVCA数据中的异质性存在,其中混合物的一个组分捕获了更具侵蚀性的疾病形式,第二组分捕获不太侵略性的形式。特别地,第二组分描绘了BRCA1的甲基化与DFT之间的显着阳性关系。通过进一步发出该基因表达和甲基化之间的负关系,可以提供生物学上合理的解释,其揭示了DNA甲基化,基因表达和突变之间的关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号