首页> 外文期刊>Research journal of applied science, engineering and technology >A Precise Distance Metric for Mixed Data Clustering using Chi-square Statistics
【24h】

A Precise Distance Metric for Mixed Data Clustering using Chi-square Statistics

机译:使用卡方统计的混合数据聚类的精确距离度量

获取原文
           

摘要

In today's scenario, data is available as a mix of numerical and categorical values. Traditional data clustering algorithms perform well for numerical data but produce poor clustering results for mixed data. For better partitioning, the distance metric used should be capable of discriminating the data points with mixed attributes. The distance measure should appropriately balance the categorical distance as well as numerical distance. In this study we have proposed a chi-square based statistical approach to determine the weight of the attributes. This weight vector is used to derive the distance matrix of the mixed dataset. The distance matrix is used to cluster the data points using the traditional clustering algorithms. Experiments have been carried out using the UCI benchmark datasets, heart, credit and vote. Apart from these data sets we have also tested our proposed method using a real-time bank data set. The accuracy of the clustering results obtained are better than those of the existing works.
机译:在当今的情况下,数据可以作为数字值和分类值的混合使用。传统的数据聚类算法对数值数据表现良好,但对混合数据却产生较差的聚类结果。为了更好地进行分区,使用的距离度量应该能够区分具有混合属性的数据点。距离度量应适当地平衡分类距离和数值距离。在这项研究中,我们提出了一种基于卡方的统计方法来确定属性的权重。该权重向量用于导出混合数据集的距离矩阵。距离矩阵用于使用传统的聚类算法对数据点进行聚类。实验已经使用UCI基准数据集,心脏,信誉和投票进行了。除了这些数据集,我们还使用实时银行数据集测试了我们提出的方法。获得的聚类结果的准确性要优于现有工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号