首页> 外文会议>International Conference on Intelligent Systems and Control >PSO based fast K-means algorithm for feature selection from high dimensional medical data set
【24h】

PSO based fast K-means algorithm for feature selection from high dimensional medical data set

机译:基于PSO的快速K均值算法从高维医学数据集中选择特征

获取原文

摘要

Features are the most important entity in any data mining and machine learning applications. They are the backbone of any model. Reliability, efficiency and accuracy of the model depends upon the choice of strong and relevant features. However, feature selection is always a time-consuming and challenging task. In this paper, we have proposed an approach where we combine a clustering technique and a stochastic technique to select effective features from the high dimensional breast cancer data set in quick time. In order to select strong and relevant features, we have used an improved version of K-means algorithm called fast K-means algorithm, which is much faster and more accurate than a general means algorithm. The fast K-means algorithm is embedded in Particle Swarm Optimization (PSO) algorithm to produce better results. The results were validated using various classification techniques and were evaluated on various performance evaluation measures. The results obtained were found to be highly supportive in nature. The feature subset generated using PSO based fast K-means algorithm on KDDcup 2008 data set produced an accuracy of 99.39% and its time complexity was found to be O(log(k)).
机译:功能是任何数据挖掘和机器学习应用程序中最重要的实体。它们是任何模型的骨干。模型的可靠性,效率和准确性取决于强大和相关功能的选择。但是,特征选择始终是一项耗时且具有挑战性的任务。在本文中,我们提出了一种方法,该方法将聚类技术和随机技术结合起来,可以快速从高维乳腺癌数据集中选择有效特征。为了选择强大且相关的功能,我们使用了改进的K-means算法版本,称为快速K-means算法,它比常规的均值算法更快,更准确。快速K均值算法被嵌入到粒子群优化(PSO)算法中,以产生更好的结果。使用各种分类技术对结果进行了验证,并使用各种性能评估手段对其进行了评估。发现获得的结果本质上是高度支持的。在KDDcup 2008数据集上使用基于PSO的快速K均值算法生成的特征子集的准确性为99.39%,其时间复杂度为O(log(k))。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号