首页> 外文会议>Proceedings of the 2010 10th International Conference on Intelligent Systems Design and Applications >Using a clustering similarity measure for feature selection in high dimensional data sets
【24h】

Using a clustering similarity measure for feature selection in high dimensional data sets

机译:在高维数据集中使用聚类相似性度量进行特征选择

获取原文

摘要

Feature selection is a very important preprocessing step in data classification. By applying it we are able to reduce the dimensionality of the problem by removing redundant or irrelevant data. High dimensional data sets are becoming usual nowadays specially in bio-informatics, biology, signal processing or text classification, increasing the need for efficient feature selection methods. In this paper we study the applicability of a clustering validation measure, the Adjusted Rand Index (ARI), for this task comparing it with other methods based on statistical tests and on ROC curve. We have performed some experiments that show the validity of the proposed method.
机译:特征选择是数据分类中非常重要的预处理步骤。通过应用它,我们能够通过删除冗余或不相关的数据来减少问题的范围。如今,高维数据集正在变得越来越普遍,特别是在生物信息学,生物学,信号处理或文本分类中,从而增加了对有效特征选择方法的需求。在本文中,我们研究了聚类验证量度(调整后的兰德指数,ARI)的适用性,并将其与基于统计检验和ROC曲线的其他方法进行了比较。我们进行了一些实验,证明了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号