首页> 外文会议>IEEE International Conference on Cybernetics >Analysis of correlation structure of data set for efficient pattern classification
【24h】

Analysis of correlation structure of data set for efficient pattern classification

机译:高效模式分类数据集相关结构分析

获取原文
获取外文期刊封面目录资料

摘要

Pattern classification or clustering plays important role in a wide variety of applications in different areas like psychology and other social sciences, biology and medical sciences, pattern recognition and data mining. A lot of algorithms for supervised or unsupervised classification have been developed so far in order to achieve high classification accuracy with lower computational cost. However, some methods or algorithms work well for some of the data sets and perform poorly on others. For any particular data set, it is difficult to find out the most suitable algorithm without some random trial and error process. It seems that the characteristics of the data set might have some influence on the algorithm for classification. In this work, the data set characteristics is studied in terms of intra attribute relationship and a measure MVS (multivariate score) has been proposed to quantify and group different data sets on the basis of the correlation structure into strong independent, weak independent, weak correlated and strong correlated data set. The performance of different feature selection algorithms on different groups of data are studied by simulation experiments with 63 publicly available bench mark data sets. It has been verified that univariate methods lead to significant performance gain for strong independent data set compared to multivariate methods while multivariate methods have better performance for strong correlated data sets.
机译:模式分类或聚类在不同领域的各种应用中起重要作用,如心理和其他社会科学,生物学和医学科学,模式识别和数据挖掘。到目前为止已经开发了许多用于监督或无监督的分类的算法,以实现具有较低计算成本的高分类准确性。但是,某些方法或算法适用于某些数据集,并且对其他人执行不佳。对于任何特定的数据集,很难找出没有一些随机试验和错误过程的最合适的算法。似乎数据集的特征可能对分类算法产生一些影响。在这项工作中,根据帧内属性关系和测量MV(多变量分数)研究了数据集特征,并提出了基于相关结构来量化和将不同的数据集分组为强的独立,弱独立,弱相关性和强烈的相关数据集。通过使用63个公共替补标记数据集进行模拟实验,研究了不同数据组上不同特征选择算法的性能。已经验证了与多变量方法相比,单变量方法导致强大的独立数据集的显着性能增益,而多变量方法具有更好的相关数据集的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号