首页> 外文学位 >Optimizing decision tree ensembles for gene-gene interaction detection.
【24h】

Optimizing decision tree ensembles for gene-gene interaction detection.

机译:优化决策树集合以进行基因-基因相互作用检测。

获取原文
获取原文并翻译 | 示例

摘要

In recent years, genome-wide association studies (GWAS) have been dedicated to unraveling the genetic etiology of complex diseases. It is widely accepted that most common diseases such as neurodegenerative diseases (e.g., Alzheimer's and Parkinson's diseases), cardiovascular diseases, various cancers, diabetes and osteoporosis are the results of multiple genes, their interactions, environmental factors, and gene-by-environment interactions and thus cannot be explained by a simple Mendelian inheritance model. Consequently, the study of dissecting gene-gene and/or gene-environment interactions involved in complex diseases/traits has become an active research topic in computational genomics. However, high dimensionalities of genotype data and exponential complexity of the search space with respect to the order of targeted interactions, make most existing interaction detection strategies practically inapplicable.;Because they are capable of capturing interactions among input variables in addition to the nonlinear effects, decision trees and their ensembles have been recently demonstrated to be effective strategies in detecting interactions in GWAS data. However, an individual decision tree (DT) is highly susceptible to some major limitations, most importantly high variance error, data fragmentation and representational problems, which make them unreliable for use in feature selection in a stand-alone fashion. Ensemble approaches have been proposed to increase the robustness of weak learners such as DTs, by using multiple different and potentially complementary representations of the data. Some of the limitations of individual decision trees would still exist in the ensemble level which may impact their interaction detection performance. The objectives of this dissertation are to:;• Study the systematic limitations of individual decision trees which may impact their interaction detection performance and the possible solutions;;• Investigate the application of decision tree ensembles in interaction detections, with respect to the functional characteristics of the applied ensemble strategy;;• Compare four well-known ensemble frameworks, namely AdaBoost, LogitBoost, Bagging and Random Forest, and their pros and cons as far as interaction detection is concerned;;• Provide a unified framework to optimize the application of DT ensembles in interaction detection.
机译:近年来,全基因组关联研究(GWAS)致力于揭示复杂疾病的遗传病因。人们普遍认为,最常见的疾病,例如神经退行性疾病(例如阿尔茨海默氏病和帕金森氏病),心血管疾病,各种癌症,糖尿病和骨质疏松症是多种基因,它们的相互作用,环境因素以及基因与环境相互作用的结果。因此无法用简单的孟德尔继承模型来解释。因此,解剖涉及复杂疾病/特征的基因-基因和/或基因-环境相互作用的研究已成为计算基因组学中一个活跃的研究课题。但是,由于基因型数据的高维数和针对目标相互作用顺序的搜索空间的指数复杂性,使得大多数现有的相互作用检测策略实际上不适用。因为除了非线性效应之外,它们还能够捕获输入变量之间的相互作用,最近证明决策树及其集合是检测GWAS数据中相互作用的有效策略。但是,单个决策树(DT)极易受到一些主要限制,最重要的是高方差误差,数据碎片和表示问题,这使它们不能独立用于特征选择。已经提出了集成方法,以通过使用数据的多个不同且可能互补的表示来提高弱学习者(如DT)的鲁棒性。单个决策树的某些局限性仍将存在于集成级别中,这可能会影响其交互检测性能。本文的目的是:;•研究各个决策树的系统局限性,这些局限性可能会影响其交互检测性能以及可能的解决方案;;•就决策树集成在交互检测中的应用,就其功能特征进行研究应用的集成策略;;•比较四个著名的集成框架,即AdaBoost,LogitBoost,Bagging和Random Forest,以及它们在交互检测方面的优缺点;;•提供一个统一的框架来优化DT的应用集成在交互检测中。

著录项

  • 作者

    Assareh, Amin.;

  • 作者单位

    Kent State University.;

  • 授予单位 Kent State University.;
  • 学科 Biology Bioinformatics.;Computer Science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 149 p.
  • 总页数 149
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号