首页> 外文学位 >Finding Integrative Biomarkers from Biomedical Datasets: An application to Clinical and Genomic Data.
【24h】

Finding Integrative Biomarkers from Biomedical Datasets: An application to Clinical and Genomic Data.

机译:从生物医学数据集中寻找整合的生物标志物:在临床和基因组数据中的应用。

获取原文
获取原文并翻译 | 示例

摘要

Human diseases, such as cancer, diabetes and schizophrenia, are inherently complex and governed by the interplay of various underlying factors ranging from genetic and genomic influences to environmental effects. Recent advancements in high throughput data collection technologies in bioinformatics have resulted in a dramatic increase in diverse data sets that can provide information about such factors related to diseases. These types of data include DNA microarrays providing cellular information, Single Nucleotide Polymorphisms (SNPs) providing genetic information, metabolomics data in terms of proteins and other metabolites, structural and functional brain data from magnetic resonance imaging (MRI), and electronic health records (EHRs) containing copious information about histo-pathological factors, demographic, and environmental effects. Despite their richness, each of these datasets only provides information about a part of the complex biological mechanism behind human diseases. Thus, effective integration of the partial information of any of these genomic and clinical data can help reveal disease complexities in greater detail by generating new data-driven hypotheses beyond the traditional hypotheses about biomarkers. In particular, integrative biomarkers, i.e., patterns of features that are predictive of disease and that go beyond the simple biomarkers derived from a single dataset, can lead to a customized and more effective approach to improving healthcare.;This thesis focuses on addressing the key issues related to integrative biomarkers by developing new data mining approaches. One very important issue of biomarker discovery is that the models have to easily interpretable, i.e., integrative models have to be not only predictive of the disease, but also interpretable enough so that domain experts can infer useful knowledge from the obtained patterns. In one such effort to make models interpretable, domain information about disease relationships was used as prior knowledge during model development. In addition, a novel metric called I-score was proposed using medical literature to quantify the interpretability of the obtained patterns.;Another key issue of integrative biomarker discovery is that there may be many potential relationships present among diverse datasets. For example, a very important types of relationship in biomarker discovery is interaction, which are those biomarkers spanning multiple datasets, whose combined features are more indicative of disease than the individual constituent factors. In particular, the individual effects of each type of factor on disease predisposition can be small and thus, remain undetected by most disease association techniques performed on individual datasets. Different types of relationships are explored and an association analysis based framework is proposed to discover them. The proposed framework is especially effective for discovering higher-order relationships, which cannot be found by the existing prominent integrative approaches for the biomarker discovery. When applied on real datasets collected from three different types of data from schizophrenic and normal subjects, this approach yielded significant integrated biomarkers which are biologically relevant.;Disease heterogeneity creates further issues for integrative biomarker discovery, biomarkers obtained from clinicogenomic studies may not be applicable to all patients in the same degree, i.e., a disease consist of multiple subtypes, each occurring in different subpopulations. Some potential reasons responsible for disease heterogeneity are different pathways playing different roles in the same disease and confounding factors such as age, ethnicity and race, or genetic predisposition, which can be available in rich EHR data. Most biomarker discovery techniques use full space model development techniques, i.e., they assess the performance of biomarkers on all patients without finding the distinct subpopulations. In this thesis, more customized models were built depending on patient's characteristics to handle disease heterogeneity.;In summary, several data mining techniques developed in this thesis advance the state-of-the art in integration of diverse biomedical datasets. Moreover, their applications on large-scale EHR yield significant discoveries, which can ultimately lead to generating new data-driven hypotheses for inferring meaningful information about complex disease mechanism.
机译:人类疾病,例如癌症,糖尿病和精神分裂症,本质上是复杂的,并受各种潜在因素(从遗传和基因组影响到环境影响)的相互作用所控制。生物信息学中高通量数据收集技术的最新进展已导致可以提供有关此类疾病相关信息的多样化数据集的急剧增加。这些类型的数据包括提供细胞信息的DNA微阵列,提供遗传信息的单核苷酸多态性(SNP),蛋白质和其他代谢物的代谢组学数据,磁共振成像(MRI)的结构和功能性大脑数据以及电子健康记录(EHR) )中包含有关组织病理学因素,人口统计学和环境影响的大量信息。尽管它们丰富,但每个数据集仅提供有关人类疾病背后复杂生物机制的一部分的信息。因此,这些基因组和临床数据中任何一个的部分信息的有效整合,都可以通过产生新的数据驱动假设,而不是传统的有关生物标志物的假设,来帮助更详细地揭示疾病的复杂性。特别是,综合性生物标志物,即预测疾病的特征模式,超越了从单一数据集获得的简单生物标志物,可以导致定制的,更有效的方法来改善医疗保健。本论文着眼于解决关键问题通过开发新的数据挖掘方法来解决与整合生物标志物有关的问题。生物标志物发现的一个非常重要的问题是模型必须易于解释,即集成模型不仅必须能够预测疾病,而且必须具有足够的解释能力,以便领域专家可以从获得的模式中推断出有用的知识。为了使模型易于解释,在模型开发过程中,有关疾病关系的领域信息被用作先验知识。此外,利用医学文献提出了一种新的度量标准,称为I分数,以量化获得的模式的可解释性。综合生物标志物发现的另一个关键问题是,在不同的数据集之间可能存在许多潜在的关系。例如,生物标志物发现中非常重要的关系类型是相互作用,它们是跨越多个数据集的那些生物标志物,其组合特征比单个构成因素更能指示疾病。特别是,每种类型的因素对疾病易感性的个体影响可能很小,因此,大多数对个体数据集进行的疾病关联技术都无法检测到。探索了不同类型的关系,并提出了一种基于关联分析的框架来发现它们。所提出的框架对于发现高阶关系特别有效,而现有的用于生物标志物发现的突出整合方法无法找到这种关系。当将其应用于从精神分裂症和正常受试者的三种不同类型的数据中收集的真实数据集时,此方法产生了重要的,与生物学相关的综合生物标记。所有患者都处于同一程度,即一种疾病由多种亚型组成,每种亚型都发生在不同的亚人群中。造成疾病异质性的一些潜在原因是,在同一疾病中扮演不同角色的不同途径以及诸如年龄,种族和种族或遗传易感性等混杂因素,可以从丰富的EHR数据中获得这些信息。大多数生物标记物发现技术使用全空间模型开发技术,即,它们评估了生物标记物在所有患者上的表现,而没有找到不同的亚群。在本文中,根据患者的特征建立了更多的定制模型来处理疾病的异质性。总之,本文开发的几种数据挖掘技术推动了各种生物医学数据集集成的最新发展。此外,它们在大规模电子病历中的应用产生了重大发现,最终可能导致产生新的数据驱动假设,以推断有关复杂疾病机制的有意义的信息。

著录项

  • 作者

    Dey, Sanjoy.;

  • 作者单位

    University of Minnesota.;

  • 授予单位 University of Minnesota.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 191 p.
  • 总页数 191
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号