首页> 外文学位 >A cohesion-based clustering technique for categorical data.
【24h】

A cohesion-based clustering technique for categorical data.

机译:基于凝聚力的分类数据聚类技术。

获取原文
获取原文并翻译 | 示例

摘要

Clustering is a technique which aims to partition a given dataset of objects into groups of similar objects. In this work, we consider categorical data, which are unordered unlike numerical data. This makes clustering such data a more challenging task. We propose a clustering technique for categorical data, which uses a novel similarity function, called cohesion, to measure the degree to which objects "stick" to clusters. We have implemented this technique, to which we refer as CLUC (CLUstering with Cohesion). To evaluate CLUC, we compared its results with those produced by well-known clustering algorithms. The results of our extensive experiments on real and synthetic datasets show that CLUC generates high quality clusters which conform better to clusterings by human experts. For some well-known real datasets, CLUC even discovers clusterings identical to those provided by experts. Our results also indicate that CLUC is order insensitive in general and is scalable when the dataset grows in size (the number of objects) and/or dimensions (attributes).
机译:聚类是一种旨在将给定对象数据集划分为相似对象组的技术。在这项工作中,我们考虑了分类数据,这些分类数据与数字数据不同,是无序的。这使得对此类数据进行聚类成为更具挑战性的任务。我们提出了一种用于分类数据的聚类技术,该技术使用一种新的相似性函数(称为内聚性)来衡量对象“粘附”到聚类的程度。我们已经实现了这种技术,我们称之为CLUC(具有凝聚力的聚类)。为了评估CLUC,我们将其结果与由众所周知的聚类算法产生的结果进行了比较。我们在真实数据集和合成数据集上进行的广泛实验结果表明,CLUC生成了高质量的聚类,更符合人类专家的聚类。对于一些众所周知的真实数据集,CLUC甚至发现与专家提供的聚类相同的聚类。我们的结果还表明,CLUC通常对顺序不敏感,并且在数据集的大小(对象数)和/或维度(属性)增长时可伸缩。

著录项

  • 作者

    Nemalhabib, Aida.;

  • 作者单位

    Concordia University (Canada).;

  • 授予单位 Concordia University (Canada).;
  • 学科 Computer Science.
  • 学位 M.Comp.Sc.
  • 年度 2006
  • 页码 87 p.
  • 总页数 87
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号