首页> 外文期刊>Briefings in bioinformatics >Investigating the role of Simpson’s paradox in the analysis of top-ranked features in high-dimensional bioinformatics datasets
【24h】

Investigating the role of Simpson’s paradox in the analysis of top-ranked features in high-dimensional bioinformatics datasets

机译:调查辛普森悖论在高维生物信息学数据集中排名上的分析中的作用

获取原文
获取原文并翻译 | 示例
           

摘要

An important problem in bioinformatics consists of identifying the most important features (or predictors), among a large number of features in a given classification dataset. This problem is often addressed by using a machine learning-based feature ranking method to identify a small set of top-ranked predictors (i.e. the most relevant features for classification). The large number of studies in this area has, however, an important limitation: they ignore the possibility that the top-ranked predictors occur in an instance of Simpson’s paradox, where the positive or negative association between a predictor and a class variable reverses sign upon conditional on each of the values of a third (confounder) variable. In this work, we review and investigate the role of Simpson’s paradox in the analysis of top-ranked predictors in high-dimensional bioinformatics datasets, in order to avoid the potential danger of misinterpreting an association between a predictor and the class variable.We perform computational experiments using four well-known feature ranking methods from the machine learning field and five high-dimensional datasets of ageing-related genes, where the predictors are Gene Ontology terms. The results show that occurrences of Simpson’s paradox involving top-ranked predictors are much more common for one of the feature ranking methods.
机译:生物信息学中的一个重要问题包括识别给定分类数据集中的大量功能中最重要的特征(或预测器)。通常通过使用基于机器学习的特征排序方法来识别一小组排名预测器(即分类的最相关的功能)来解决这个问题。然而,这一领域的大量研究具有一个重要的限制:它们忽略了辛普森悖论的实例中的最排名预测因子的可能性,其中预测器和类变量之间的正或负关联反转撤销条件在第三(混淆器)变量的每个值上。在这项工作中,我们审查并调查辛普森悖论在高维生物信息学数据集中排名的分析中的作用,以避免误解预测器和类变量之间的关联的潜在危险。我们执行计算使用来自机器学习领域的四种众所周知的特征排名方法和衰老相关基因的五个高维数据集的实验,其中预测因子是基因本体论术语。结果表明,涉及排名排名预测器的辛普森的悖论的发生对于其中一个特征排名方法更为常见。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号