【24h】

Low Dimensional Representation of Space Structure and Clustering of Categorical Data

机译:空间结构的低维表示和分类数据的聚类

获取原文
获取原文并翻译 | 示例

摘要

Dissimilarity measurement plays a key role in clustering analysis. Due to the lack of order relation between categorical values, the clustering of categorical data is harder than that of numerical data. To improve the clustering quality of categorical data, SBC (space structure based clustering) algorithm proposed a novel representation scheme for the space structures of them. The representation scheme improved the discriminability of categorical data, while caused problems either: low-efficiency and high-dimensionality. In this work, we prove that it is possible to represent categorical data with the space structure more efficiently while maintaining the same clustering performance. To achieve that, a fraction of representative objects is selected as the reference set, with which a low-dimensional space structure matrix would be built. Since the reference set directly affect the dissimilarity measure, a cluster-based method is proposed to get better reference set. The theoretical and experimental proofs show that, compared with SBC method, the proposed methods are more efficient and extendable maintaining the approximately same clustering performance.
机译:差异度量在聚类分析中起关键作用。由于分类值之间缺乏顺序关系,因此分类数据的聚类要比数字数据难。为了提高分类数据的聚类质量,SBC(基于空间结构的聚类)算法针对其空间结构提出了一种新颖的表示方案。表示方案改善了分类数据的可分辨性,同时引起了以下问题:低效率和高维度。在这项工作中,我们证明了可以在保持相同聚类性能的同时,更有效地用空间结构表示分类数据。为此,选择一小部分代表性对象作为参考集,以此来构建低维空间结构矩阵。由于参考集直接影响相异性度量,因此提出了一种基于聚类的方法来获得更好的参考集。理论和实验证明,与SBC方法相比,所提出的方法在保持近似相同的聚类性能的情况下更有效,更可扩展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号