...
首页> 外文期刊>Quality Control, Transactions >Learning From High-Dimensional Biomedical Datasets: The Issue of Class Imbalance
【24h】

Learning From High-Dimensional Biomedical Datasets: The Issue of Class Imbalance

机译:从高维生物医学数据集学习:类别不平衡问题

获取原文
获取原文并翻译 | 示例
           

摘要

As witnessed by a vast corpus of literature, dimensionality reduction is a fundamental step for biomedical data analysis. Indeed, in this domain, there is often the need for coping with a huge number of data attributes (or features). By removing irrelevant or redundant attributes, feature selection techniques can significantly reduce the complexity of the original problem, with important benefits in terms of domain understanding and knowledge discovery. When learning from biomedical data, however, the dimensionality issue is often addressed without a joint consideration of other critical aspects that may compromise the performance of the induced models. The adverse implications of an imbalanced class distribution, for example, are often neglected in this domain. The aim of this work is to investigate the effectiveness of hybrid learning strategies that incorporate both methods for dimensionality reduction as well as methods for alleviating the issue of class imbalance. Specifically, we combine different feature selection techniques, both univariate and multivariate, with sampling-based class balancing methods and cost-sensitive classification. The performance of the resulting learning schemes is experimentally evaluated on six high-dimensional genomic benchmarks, using different classification algorithms, with interesting insight about the best strategies to use based on the characteristics of the data at hand.
机译:如庞大的文献语料库所证明,维度减少是生物医学数据分析的基本步骤。实际上,在这个域中,通常需要应对大量数据属性(或功能)。通过删除无关或冗余属性,特征选择技术可以显着降低原始问题的复杂性,在域的理解和知识发现方面具有重要的好处。然而,当从生物医学数据学习时,通常在没有联合审议可能损害所诱导模型性能的其他关键方面的联合考虑其他关键方面的维度问题。例如,在该域中通常忽略不平衡类分布的不利影响。这项工作的目的是调查混合学习策略的有效性,这些策略纳入了两种方法,以及减轻阶级失衡问题的方法。具体而言,我们将不同的特征选择技术,单变量和多变量组合起来,采用基于采样的类平衡方法和成本敏感的分类。通过不同的分类算法在实验评估所产生的学习计划的性能,以不同的分类算法,有趣的洞察力基于手头数据的特征使用的最佳策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号