首页> 外文会议>IEEE Workshop on Intelligent Data Acquisition and Advanced Computing Systems >Handling Datasets in a Multi-Relational Environment: Cluster Dispersion vs Cluster Purity
【24h】

Handling Datasets in a Multi-Relational Environment: Cluster Dispersion vs Cluster Purity

机译:处理多关系环境中的数据集:群集色散与群集纯度

获取原文

摘要

Clustering multiple-instances in a multi-relational environment requires data transformations (e.g. data aggregation) from datasets stored in multiple tables into a single table. Unfortunately, most relational databases are limited to a few basic methods of aggregation (e.g. max, min, sum, count, ave) to aggregate continuous and categorical values. Therefore, data transformation is limited only to aggregation of continuous and categorical values. In this paper, to get the best number of clusters, we propose a genetic semi-supervised clustering technique as a means of aggregating data stored in multiple tables. This algorithm is suitable for classification of datasets with a high degree of one-to-many associations, in which a single record has multiple instances that are associated with it. The clustering algorithm can be used in two ways. One is the unsupervised clustering, where the user may control the result of clustering by optimizing the value of cluster dispersion. The other is a semi-supervised clustering, where the user uses an unsupervised clustering method optimized with a genetic algorithm incorporating a measure of classification accuracy used in decision tree algorithm, the GINI index. In this paper, we examine both methods to dynamically cluster multiple instances, as a means of aggregating them, and illustrate the effectiveness of the semi-supervised genetic algorithm-based clustering technique.
机译:在一个多关系环境聚类多重实例需要从存储在多个表中为单个数据集表的数据转换(例如,数据聚集)。不幸的是,大多数关系数据库被限制为聚集的一些基本方法(例如最大值,最小值,求和,计数,AVE)到骨料连续和分类值。因此,数据转换仅限于连续和分类值的聚集。在本文中,让集群的最佳数目,我们提出了一个遗传半监督聚类技术为聚合存储在多个表中的数据的一种手段。本算法适用于具有高度一个一对多关联,其中单个记录具有与它相关联的多个实例的数据集的分类。聚类算法可以以两种方式使用。一个是无监督聚类,其中,用户可以通过优化集群色散值控制聚类的结果。另一种是一个半监督聚类,其中用户使用具有遗传算法掺入决策树算法,基尼系数用于分类精度的测量优化的无监督聚类方法。在本文中,我们研究这两种方法来动态群集多个实例,作为聚合它们的装置,以及示出了半监督基于遗传算法的聚类技术的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号