首页> 外文期刊>Neurocomputing >Learning category distance metric for data clustering
【24h】

Learning category distance metric for data clustering

机译:用于数据聚类的学习类别距离度量

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Unsupervised learning of adaptive distance metrics for categorical data is currently a challenge due to the difficulties in defining an inherently meaningful measure parameterizing the heterogeneity within matched or mismatched categorical symbols. In this paper, a new distance metric called category distance and a non-center-based algorithm are proposed for categorical data clustering. The new metric is formulated based on the category weights for each categorical attribute, no more depending on the common assumption that all categories on the same attribute are independent of each other. The problem of learning the category distance is therefore transformed into the new problem of learning a set of category weights, which can be jointly optimized with the clusters optimization. A case study on DNA sequences and experimental results on ten real-world data sets from different domains are given to demonstrate the performance of the proposed methods with comparisons to the existing distance measures for categorical data. (C) 2018 Elsevier B.V. All rights reserved.
机译:由于难以定义参数化匹配或不匹配的分类符号内的异质性的内在有意义的度量,因此目前对分类数据的自适应距离度量进行无监督学习是一个挑战。本文针对分类数据聚类提出了一种新的距离度量,即类别距离和一种基于非中心的算法。基于每个类别属性的类别权重来制定新的度量标准,而不再依赖于相同属性上所有类别彼此独立的共同假设。因此,学习类别距离的问题被转化为学习一组类别权重的新问题,可以与聚类优化一起对其进行优化。通过对来自不同领域的十个现实世界数据集的DNA序列和实验结果进行案例研究,以证明所提方法的性能,并与现有的分类数据距离度量进行了比较。 (C)2018 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号