首页> 外文会议>Roedunet International Conference >Improving Heterogeneous Data Clustering by Using Metadata and Compression Algorithms

【24h】

Improving Heterogeneous Data Clustering by Using Metadata and Compression Algorithms

机译：使用元数据和压缩算法改善异构数据聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Nowadays, we have to deal with a large quantity of unstructured, heterogeneous data, produced by an increasing number of sources. Clustering heterogeneous data is essential to getting structured information in response to user queries. In this paper, we assess the results of a new clustering technique -clustering by compression - when applied to metadata associated with heterogeneous sets of data. The clustering by compression procedure is based on a parameter-free, universal, similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pair-wise concatenation). Experimental results show that using metadata could improve the average clustering performances with about 20% over clustering the same sample data set without using metadata.

机译：如今，我们必须处理大量的非结构化异构数据，由越来越多的来源产生。群集异构数据对于响应用户查询而使结构化信息至关重要。在本文中，我们通过压缩评估新的聚类技术的结果 - 应用于与异构数据集相关联的元数据时。通过压缩过程的聚类基于从压缩数据文件的长度（单独和编写的级联）计算的无参数，通用，相似距离，归一化压缩距离或NCD。实验结果表明，使用元数据可以在不使用元数据的情况下，在聚类相同的样本数据集中来改善大约20％的平均聚类性能。

著录项

来源
《Roedunet International Conference》|2010年||共5页
会议地点
作者
Alexandra Cernian; Dorin Carstoiu; Valentin Sgarciu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词
Clustering by compression; Normalized compression distance; Heterogeneous data; Metadata;

机译：通过压缩聚类;归一化压缩距离;异构数据;元数据;

相似文献

外文文献
中文文献
专利

1. Glowworm Swarm Optimization Algorithm- and K-Prototypes Algorithm-Based Metadata Tree Clustering [J] . Yaping Li Mathematical Problems in Engineering: Theory, Methods and Applications . 2021,第a期

机译：基于萤石群优化算法和基于k原型的算法的元数据树聚类
2. Communication scheduling in data gathering networks of heterogeneous sensors with data compression: Algorithms and empirical experiments [J] . Luo Wenchang, Gu Boyuan, Lin Guohui European Journal of Operational Research . 2018,第2期

机译：数据压缩数据收集网络中数据收集网络的通信调度：算法和经验实验
3. MetaStore: an adaptive metadata management framework for heterogeneous metadata models [J] . Ajinkya Prabhune, Rainer Stotzka, Vaibhav Sakharkar, Distributed and Parallel Databases . 2018,第1期

机译：MetaStore：用于异构元数据模型的自适应元数据管理框架
4. Improving heterogeneous data clustering by using metadata and compression algorithms [C] . Cernian Alexandra, Carstoiu Dorin, Sgarciu Valentin 9th Roedunet International Conference . 2010

机译：使用元数据和压缩算法改善异构数据聚类
5. Improved EM-Type Algorithms for Fitting Marginal Zero-Inflated Regression Models to Clustered Data with Excess Zeros [D] . Benesi, Tawanda. 2020

机译：改进的EM型算法，用于将边缘零膨胀的回归模型拟合到具有多余零的聚类数据
6. Performance evaluation results of evolutionary clustering algorithm star for clustering heterogeneous datasets [O] . Bryar A. Hassan, Tarik A. Rashid, Seyedali Mirjalili 2021

机译：群体异构数据集的进化聚类算法星的性能评估结果
7. Performance evaluation of a load self-balancing method for heterogeneous metadata server cluster using trace-driven and synthetic workload simulation [O] . Bin Cai, Changsheng Xie, Guangxi Zhu 2007

机译：使用跟踪驱动和综合工作负载模拟的异构元数据服务器集群负载自平衡方法的性能评估
8. Cluster Compression Algorithm A Joint Clustering/Data Compression Concept [R] . Edward E. Hilbert 1977

机译：聚类压缩算法联合聚类/数据压缩概念

Improving Heterogeneous Data Clustering by Using Metadata and Compression Algorithms

摘要

著录项

相似文献

相关主题

期刊订阅