首页> 美国卫生研究院文献>BMC Genetics >Data mining of the GAW14 simulated data using rough set theory and tree-based methods
【2h】

Data mining of the GAW14 simulated data using rough set theory and tree-based methods

机译:使用粗糙集理论和基于树的方法对GAW14模拟数据进行数据挖掘

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Rough set theory and decision trees are data mining methods used for dealing with vagueness and uncertainty. They have been utilized to unearth hidden patterns in complicated datasets collected for industrial processes. The Genetic Analysis Workshop 14 simulated data were generated using a system that implemented multiple correlations among four consequential layers of genetic data (disease-related loci, endophenotypes, phenotypes, and one disease trait). When information of one layer was blocked and uncertainty was created in the correlations among these layers, the correlation between the first and last layers (susceptibility genes and the disease trait in this case), was not easily directly detected. In this study, we proposed a two-stage process that applied rough set theory and decision trees to identify genes susceptible to the disease trait. During the first stage, based on phenotypes of subjects and their parents, decision trees were built to predict trait values. Phenotypes retained in the decision trees were then advanced to the second stage, where rough set theory was applied to discover the minimal subsets of genes associated with the disease trait. For comparison, decision trees were also constructed to map susceptible genes during the second stage. Our results showed that the decision trees of the first stage had accuracy rates of about 99% in predicting the disease trait. The decision trees and rough set theory failed to identify the true disease-related loci.
机译:粗糙集理论和决策树是用于处理模糊性和不确定性的数据挖掘方法。它们已用于发掘为工业过程收集的复杂数据集中的隐藏模式。遗传分析研讨会14个模拟数据是使用一种系统生成的,该系统在遗传数据的四个相应层(疾病相关基因座,内表型,表型和一种疾病性状)之间实现了多种关联。当一层的信息被阻塞并且在这些层之间的相关性中产生不确定性时,第一层和最后一层之间的相关性(在这种情况下为易感基因和疾病性状)不容易被直接检测到。在这项研究中,我们提出了一个分为两个阶段的过程,该过程应用粗糙集理论和决策树来识别易患疾病特征的基因。在第一阶段,根据受试者及其父母的表型,建立决策树以预测特征值。保留在决策树中的表型随后进入第二阶段,在此阶段,应用粗糙集理论来发现与疾病性状相关的基因的最小子集。为了进行比较,还构建了决策树以在第二阶段定位易感基因。我们的结果表明,第一阶段的决策树在预测疾病特征方面的准确率约为99%。决策树和粗糙集理论未能确定真正的疾病相关基因座。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号