首页> 美国卫生研究院文献>other >Learning Parsimonious Classification Rules from Gene Expression Data Using Bayesian Networks with Local Structure
【2h】

Learning Parsimonious Classification Rules from Gene Expression Data Using Bayesian Networks with Local Structure

机译:使用具有局部结构的贝叶斯网络从基因表达数据中学习简约分类规则

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The comprehensibility of good predictive models learned from high-dimensional gene expression data is attractive because it can lead to biomarker discovery. Several good classifiers provide comparable predictive performance but differ in their abilities to summarize the observed data. We extend a Bayesian Rule Learning (BRL-GSS) algorithm, previously shown to be a significantly better predictor than other classical approaches in this domain. It searches a space of Bayesian networks using a decision tree representation of its parameters with global constraints, and infers a set of IF-THEN rules. The number of parameters and therefore the number of rules are combinatorial to the number of predictor variables in the model. We relax these global constraints to a more generalizable local structure (BRL-LSS). BRL-LSS entails more parsimonious set of rules because it does not have to generate all combinatorial rules. The search space of local structures is much richer than the space of global structures. We design the BRL-LSS with the same worst-case time-complexity as BRL-GSS while exploring a richer and more complex model space. We measure predictive performance using Area Under the ROC curve (AUC) and Accuracy. We measure model parsimony performance by noting the average number of rules and variables needed to describe the observed data. We evaluate the predictive and parsimony performance of BRL-GSS, BRL-LSS and the state-of-the-art C4.5 decision tree algorithm, across 10-fold cross-validation using ten microarray gene-expression diagnostic datasets. In these experiments, we observe that BRL-LSS is similar to BRL-GSS in terms of predictive performance, while generating a much more parsimonious set of rules to explain the same observed data. BRL-LSS also needs fewer variables than C4.5 to explain the data with similar predictive performance. We also conduct a feasibility study to demonstrate the general applicability of our BRL methods on the newer RNA sequencing gene-expression data.
机译:从高维基因表达数据中学到的良好预测模型的可理解性很有吸引力,因为它可以导致生物标记物的发现。几个好的分类器提供了可比较的预测性能,但是汇总观察数据的能力却有所不同。我们扩展了贝叶斯规则学习(BRL-GSS)算法,该算法先前被证明比该领域的其他经典方法好得多。它使用具有全局约束的参数的决策树表示来搜索贝叶斯网络的空间,并推断出一组IF-THEN规则。参数的数量以及规则的数量与模型中的预测变量的数量组合在一起。我们将这些全局约束放宽为更通用的本地结构(BRL-LSS)。 BRL-LSS包含更简化的规则集,因为它不必生成所有组合规则。局部结构的搜索空间比全局结构的搜索空间丰富得多。在探索更丰富,更复杂的模型空间的同时,我们设计的BRL-LSS具有与BRL-GSS相同的最坏情况时间复杂度。我们使用ROC曲线下面积(AUC)和准确度来衡量预测效果。我们通过记录描述观测数据所需的规则和变量的平均数量来衡量模型的简约性能。我们使用十个微阵列基因表达诊断数据集,通过10倍交叉验证,评估了BRL-GSS,BRL-LSS和最新的C4.5决策树算法的预测和简约性能。在这些实验中,我们观察到BRL-LSS在预测性能方面类似于BRL-GSS,同时生成了更为简化的规则集来解释相同的观察数据。与C4.5相比,BRL-LSS还需要较少的变量来解释具有相似预测性能的数据。我们还进行了可行性研究,以证明我们的BRL方法在较新的RNA测序基因表达数据上的普遍适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号