首页> 外文期刊>International journal of data mining and bioinformatics >Using the two-population genetic algorithm with distance-based k-nearest neighbour voting classifier for high-dimensional data
【24h】

Using the two-population genetic algorithm with distance-based k-nearest neighbour voting classifier for high-dimensional data

机译:结合基于距离的k最近邻投票分类器的两种群遗传算法处理高维数据

获取原文
获取原文并翻译 | 示例
           

摘要

Owing to developments in computer technology, high-dimensional data has become a popular research issue. However, the traditional statistical methods cannot perform well when the variable numbers (p) are greater than the sample size (n). Accordingly, this paper proposes a novel hybrid model that combines statistical methodology with data mining techniques for the classification of high-dimensional data. In the proposed model, the Fisher's least significant difference test was originally used for initial dimension reduction. Subsequently, this paper uses a two-population genetic algorithms and a non-parametric statistics classification method (distance-based k-nearest neighbour voting classifier) to evaluate and to rank the variables' importance. Furthermore, the evaluation of the relevant variables for classification is considered with the outlier detection method. Eight different public gene expression datasets are used to compare the performance of the proposed model with the existing methods. The experimental results indicate that the proposed model performs better than the existing methods in terms of the classification accuracy.
机译:由于计算机技术的发展,高维数据已成为流行的研究问题。但是,当变量数(p)大于样本大小(n)时,传统的统计方法无法很好地执行。因此,本文提出了一种新颖的混合模型,该模型将统计方法与数据挖掘技术相结合,用于高维数据的分类。在提出的模型中,费舍尔最小显着性差异检验最初用于初始尺寸缩减。随后,本文使用两种群遗传算法和非参数统计分类方法(基于距离的k最近邻投票分类器)对变量的重要性进行评估和排名。此外,使用离群值检测方法考虑对相关变量进行评估以进行分类。八个不同的公共基因表达数据集用于比较所提出的模型与现有方法的性能。实验结果表明,该模型在分类精度上优于现有方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号