...
首页> 外文期刊>Bioinformatics >Bayesian rule learning for biomedical data mining
【24h】

Bayesian rule learning for biomedical data mining

机译:贝叶斯规则学习用于生物医学数据挖掘

获取原文
获取原文并翻译 | 示例
           

摘要

Motivation: Disease state prediction from biomarker profiling studies is an important problem because more accurate classification models will potentially lead to the discovery of better, more discriminative markers. Data mining methods are routinely applied to such analyses of biomedical datasets generated from high-throughput 'omic' technologies applied to clinical samples from tissues or bodily fluids. Past work has demonstrated that rule models can be successfully applied to this problem, since they can produce understandable models that facilitate review of discriminative biomarkers by biomedical scientists. While many rule-based methods produce rules that make predictions under uncertainty, they typically do not quantify the uncertainty in the validity of the rule itself. This article describes an approach that uses a Bayesian score to evaluate rule models.Results: We have combined the expressiveness of rules with the mathematical rigor of Bayesian networks (BNs) to develop and evaluate a Bayesian rule learning (BRL) system. This system utilizes a novel variant of the K2 algorithm for building BNs from the training data to provide probabilistic scores for IF-antecedent-THEN-consequent rules using heuristic best-first search. We then apply rule-based inference to evaluate the learned models during 10-fold cross-validation performed two times. The BRL system is evaluated on 24 published 'omic' datasets, and on average it performs on par or better than other readily available rule learning methods. Moreover, BRL produces models that contain on average 70% fewer variables, which means that the biomarker panels for disease prediction contain fewer markers for further verification and validation by bench scientists.
机译:动机:通过生物标志物分析研究预测疾病状态是一个重要的问题,因为更准确的分类模型可能会导致发现更好,更具区分性的标志物。数据挖掘方法通常应用于从高通量“组学”技术生成的生物医学数据集的此类分析,该技术适用于来自组织或体液的临床样本。过去的工作表明,规则模型可以成功地应用于此问题,因为它们可以产生易于理解的模型,从而有助于生物医学科学家对具有区别性的生物标记物进行审查。尽管许多基于规则的方法会生成在不确定性条件下进行预测的规则,但它们通常不会量化规则本身有效性的不确定性。本文描述了一种使用贝叶斯得分评估规则模型的方法。结果:我们将规则的表达能力与贝叶斯网络(BNs)的数学严谨性相结合,以开发和评估贝叶斯规则学习(BRL)系统。该系统利用K2算法的新颖变体从训练数据中构建BN,以使用启发式最佳优先搜索为IF-先验-THEN-后继规则提供概率分数。然后,我们应用基于规则的推理来评估两次执行的10倍交叉验证期间的学习模型。 BRL系统在24个已发布的“ omic”数据集上进行了评估,平均而言,其性能与其他现成的规则学习方法相当或更好。此外,BRL产生的模型平均减少了70%的变量,这意味着用于疾病预测的生物标记物面板包含的标记物更少,可供长凳科学家进一步验证和确认。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号