首页> 外文学位 >Classification of complex disease incorporating interaction effects and identifying causal SNPs in the context of rare variants.
【24h】

Classification of complex disease incorporating interaction effects and identifying causal SNPs in the context of rare variants.

机译:复杂疾病的分类,其中包括相互作用效应,并在罕见变体的背景下识别因果SNP。

获取原文
获取原文并翻译 | 示例

摘要

In the past decade, the genome-wide association studies (GWAS) have sequenced over 40 complex diseases using microarray technology and their genetic associations have been intensively studied. However, though some of the diseases are clearly inherited, only 5-10% of the disease variation can be explained through these studies. It is now widely believed that the missing inheritability might be due to either the failure to incorporate interaction effects, or the ignorance of the rare variants effects in the genome. In this dissertation, the genetic association is studied from the above-mentioned two aspects. In Essay I (Chapters 1-4), a classification algorithm, incorporating the interactions among variables, is proposed. The algorithm is assessed on several gene-expression datasets and results in much lower error rates than those reported in literature. In Essay II (Chapters 5-8), a new approach, using established statistical methods, is applied to identify causal SNPs in the context of rare variants. The resulting false-discovery rate is the lowest among the methods that use the same dataset.;High-dimensionality is one of the most challenging problems in the analysis of genetic data. In both studies, a search-by-layer framework is used to overcome the high-dimensionality problem. In the first layer, high quality markers are selected using certain statistics, and the number of variables is thereby reduced from thousands to around a hundred for both of the studies. The second layer in the framework is project specific. In the first essay, subsets with high-order interactions are formed, and in the second project, the false-positive markers are eliminated. Both of these exercises lead to the identification of a small pool of influential variables. In the first study, the identified variables are used to form a classification rule.
机译:在过去的十年中,全基因组关联研究(GWAS)已使用微阵列技术对40多种复杂疾病进行了测序,并对它们的遗传关联进行了深入研究。然而,尽管某些疾病显然是遗传的,但通过这些研究只能解释5-10%的疾病变异。现在,人们普遍认为缺失的遗传性可能是由于未能整合相互作用效应,或是由于基因组中稀有变异效应的无知。本文从上述两个方面研究了遗传关联。在论文I(第1-4章)中,提出了一种分类算法,其中考虑了变量之间的相互作用。该算法在几个基因表达数据集上进行了评估,其错误率比文献报道的错误率低得多。在论文II(第5-8章)中,采用了一种已建立的统计方法的新方法被用于在罕见变体的背景下识别因果SNP。在使用相同数据集的方法中,错误发现率最低。高维性是遗传数据分析中最具挑战性的问题之一。在两项研究中,都使用逐层搜索框架来克服高维问题。在第一层中,使用某些统计数据选择高质量的标记,从而将两项研究的变量数量从数千减少到大约一百。框架中的第二层是特定于项目的。在第一篇文章中,形成了具有高阶交互作用的子集,而在第二篇文章中,消除了假阳性标记。这两个练习都导致识别出少量的影响变量。在第一个研究中,识别出的变量用于形成分类规则。

著录项

  • 作者

    Wang, Haitian.;

  • 作者单位

    Hong Kong University of Science and Technology (Hong Kong).;

  • 授予单位 Hong Kong University of Science and Technology (Hong Kong).;
  • 学科 Biology Bioinformatics.;Computer Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 80 p.
  • 总页数 80
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号