首页> 外文学位 >Machine learning approaches to understanding the genetic basis of complex traits.
【24h】

Machine learning approaches to understanding the genetic basis of complex traits.

机译:机器学习方法,用于理解复杂性状的遗传基础。

获取原文
获取原文并翻译 | 示例

摘要

Humans differ in many observable qualities, termed 'phenotypes', ranging from appearance to disease susceptibility. Many phenotypes are largely determined by each individual's specific 'genotype', stored in the 3.2 billion bases of his or her genome sequence. Deciphering the genome sequence by finding which sequence variations affect a certain phenotype would have a great impact on human life. The recent advent of high-throughput genotyping methods has enabled retrieval of an individual's sequence information on a genome-wide scale. Classical approaches have focused on finding a significant correlation between a sequence variation S and a particular phenotype P from the genotype and phenotype data. However, it is difficult to directly infer such causal relationships between S and P from limited data, because of: (1) the complexity of cellular mechanisms, through which S causes P, and (2) environmental factors that are not necessarily measurable.;In this dissertation, we present machine learning approaches that address these challenges by explicitly modeling an intermediate process between the genotype and phenotype. More specifically, we model the genetic regulatory mechanisms that are induced by sequence variations and that lead to the phenotype, and we learn the model from genome-wide mRNA expression measurements. Using the learned model, we aim to generate a finer-grained hypothesis such as: a sequence variation S induces regulatory interactions R, which lead to changes in the phenotype P.;To achieve this goal, our approach utilizes sophisticated machine learning techniques that can robustly select relevant biological interactions among a large number of possible interactions and can efficiently solve the optimization problem from a large amount of data. For example, our 'meta-prior algorithm' can learn the regulatory potential of each sequence variation based on their intrinsic characteristics, and this improvement helps to identify a true causal sequence variation among a large number of variations in the same chromosomal region. Our approaches have led to novel insights on sequence variations, and some of the hypotheses have been validated through biological experiments. Some of the machine learning techniques developed for biological problems are generally applicable to a wideranging set of applications such as collaborative filtering and natural language processing.
机译:从外观到疾病易感性,人类在许多可观察的质量(称为“表型”)方面有所不同。许多表型很大程度上取决于每个人的特定“基因型”,它们存储在其基因组序列的32亿个碱基中。通过发现哪些序列变异会影响某个表型来解密基因组序列将对人类生活产生重大影响。高通量基因分型方法的最新出现使得能够在全基因组范围内检索个人的序列信息。经典方法集中在从基因型和表型数据中发现序列变异S和特定表型P之间的显着相关性。然而,由于以下原因,很难直接推断出S和P之间的这种因果关系:(1)S引起P的细胞机制的复杂性;(2)不一定可测量的环境因素。在本文中,我们提出了通过明确地模拟基因型和表型之间的中间过程来解决这些挑战的机器学习方法。更具体地说,我们对由序列变异诱导并导致表型的遗传调控机制进行建模,并从全基因组mRNA表达测量中学习该模型。使用学习的模型,我们旨在生成更细粒度的假设,例如:序列变异S诱导调节相互作用R,这导致表型P发生变化;为了实现这一目标,我们的方法利用了复杂的机器学习技术在大量可能的相互作用中稳健地选择相关的生物相互作用,并可以从大量数据中有效地解决优化问题。例如,我们的“元先验算法”可以基于其固有特征学习每个序列变异的调控潜力,而这种改进有助于在同一染色体区域的大量变异中识别出真正的因果序列变异。我们的方法导致了对序列变异的新颖见解,并且某些假设已通过生物学实验得到验证。针对生物学问题开发的某些机器学习技术通常适用于范围更广的应用程序集,例如协作过滤和自然语言处理。

著录项

  • 作者

    Lee, Su-In.;

  • 作者单位

    Stanford University.;

  • 授予单位 Stanford University.;
  • 学科 Biology Bioinformatics.;Computer Science.;Artificial Intelligence.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 191 p.
  • 总页数 191
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 人工智能理论;自动化技术、计算机技术;
  • 关键词

  • 入库时间 2022-08-17 11:38:14

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号