首页> 外文会议>9th Roedunet International Conference >Improving heterogeneous data clustering by using metadata and compression algorithms
【24h】

Improving heterogeneous data clustering by using metadata and compression algorithms

机译:使用元数据和压缩算法改善异构数据聚类

获取原文

摘要

Nowadays, we have to deal with a large quantity of unstructured, heterogeneous data, produced by an increasing number of sources. Clustering heterogeneous data is essential to getting structured information in response to user queries. In this paper, we assess the results of a new clustering technique - clustering by compression - when applied to metadata associated with heterogeneous sets of data. The clustering by compression procedure is based on a parameter-free, universal, similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pair-wise concatenation). Experimental results show that using metadata could improve the average clustering performances with about 20% over clustering the same sample data set without using metadata.
机译:如今,我们必须处理由越来越多的源产生的大量非结构化,异构数据。群集异构数据对于响应用户查询获取结构化信息至关重要。在本文中,当将新的聚类技术应用于与异构数据集关联的元数据时,我们评估了一种新的聚类技术的结果-通过压缩进行聚类。通过压缩过程进行聚类是基于无参数的,通用的,相似距离,归一化的压缩距离或NCD,该压缩距离或NCD是根据压缩数据文件的长度(单个和成对串联)计算的。实验结果表明,与不使用元数据对相同样本数据集进行聚类相比,使用元数据可以将平均聚类性能提高约20%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号