首页> 外文会议>Roedunet International Conference >Improving Heterogeneous Data Clustering by Using Metadata and Compression Algorithms
【24h】

Improving Heterogeneous Data Clustering by Using Metadata and Compression Algorithms

机译:使用元数据和压缩算法改善异构数据聚类

获取原文

摘要

Nowadays, we have to deal with a large quantity of unstructured, heterogeneous data, produced by an increasing number of sources. Clustering heterogeneous data is essential to getting structured information in response to user queries. In this paper, we assess the results of a new clustering technique -clustering by compression - when applied to metadata associated with heterogeneous sets of data. The clustering by compression procedure is based on a parameter-free, universal, similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pair-wise concatenation). Experimental results show that using metadata could improve the average clustering performances with about 20% over clustering the same sample data set without using metadata.
机译:如今,我们必须处理大量的非结构化异构数据,由越来越多的来源产生。群集异构数据对于响应用户查询而使结构化信息至关重要。在本文中,我们通过压缩评估新的聚类技术的结果 - 应用于与异构数据集相关联的元数据时。通过压缩过程的聚类基于从压缩数据文件的长度(单独和编写的级联)计算的无参数,通用,相似距离,归一化压缩距离或NCD。实验结果表明,使用元数据可以在不使用元数据的情况下,在聚类相同的样本数据集中来改善大约20%的平均聚类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号