Handling Datasets in a Multi-Relational Environment: Cluster Dispersion vs Cluster Purity

机译：处理多关系环境中的数据集：群集色散与群集纯度

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering multiple-instances in a multi-relational environment requires data transformations (e.g. data aggregation) from datasets stored in multiple tables into a single table. Unfortunately, most relational databases are limited to a few basic methods of aggregation (e.g. max, min, sum, count, ave) to aggregate continuous and categorical values. Therefore, data transformation is limited only to aggregation of continuous and categorical values. In this paper, to get the best number of clusters, we propose a genetic semi-supervised clustering technique as a means of aggregating data stored in multiple tables. This algorithm is suitable for classification of datasets with a high degree of one-to-many associations, in which a single record has multiple instances that are associated with it. The clustering algorithm can be used in two ways. One is the unsupervised clustering, where the user may control the result of clustering by optimizing the value of cluster dispersion. The other is a semi-supervised clustering, where the user uses an unsupervised clustering method optimized with a genetic algorithm incorporating a measure of classification accuracy used in decision tree algorithm, the GINI index. In this paper, we examine both methods to dynamically cluster multiple instances, as a means of aggregating them, and illustrate the effectiveness of the semi-supervised genetic algorithm-based clustering technique.

机译：在一个多关系环境聚类多重实例需要从存储在多个表中为单个数据集表的数据转换（例如，数据聚集）。不幸的是，大多数关系数据库被限制为聚集的一些基本方法（例如最大值，最小值，求和，计数，AVE）到骨料连续和分类值。因此，数据转换仅限于连续和分类值的聚集。在本文中，让集群的最佳数目，我们提出了一个遗传半监督聚类技术为聚合存储在多个表中的数据的一种手段。本算法适用于具有高度一个一对多关联，其中单个记录具有与它相关联的多个实例的数据集的分类。聚类算法可以以两种方式使用。一个是无监督聚类，其中，用户可以通过优化集群色散值控制聚类的结果。另一种是一个半监督聚类，其中用户使用具有遗传算法掺入决策树算法，基尼系数用于分类精度的测量优化的无监督聚类方法。在本文中，我们研究这两种方法来动态群集多个实例，作为聚合它们的装置，以及示出了半监督基于遗传算法的聚类技术的有效性。

著录项

来源
《IEEE Workshop on Intelligent Data Acquisition and Advanced Computing Systems》|2007年||共6页
会议地点
作者
Rayner Alfred; Dimitar Kazakov;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP274-53;
关键词
Data Aggregation; Clustering; Semi-supervised Clustering; Genetic Algorithm; Relational Data Mining; Data Pre-processin;

机译：数据聚合;聚类;半监督聚类;遗传算法;关系数据挖掘;数据预处理;

相似文献

外文文献
中文文献
专利

1. A New Algorithm for Fuzzy Clustering Handling Incomplete Dataset [J] . Abidi Balkis, Sadok Ben Yahia International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms . 2014,第4期

机译：模糊聚类处理不完整数据集的新算法
2. Investigation of the AODV and the SDWCA QoS handling at different utilisation levels in adaptive clustering environments [J] . Al-Baadani Faris, Yousef Sufian Y., Tapaswi Shasikala, International journal of systems assurance engineering and management . 2017,第1期

机译：自适应集群环境中不同利用率级别下AODV和SDWCA QoS处理的研究
3. Periodic Dispersion-Corrected Approach for Isolation Spectroscopy of N-2 in an Argon Environment: Clusters, Surfaces, and Matrices [J] . Makina Y., Mahjoubi K., Benoit D. M., The journal of physical chemistry, A. Molecules, spectroscopy, kinetics, environment, & general theory . 2017,第21期

机译：氩环境中N-2分离光谱的周期性色散校正方法：簇，表面和矩阵
4. Handling Datasets in a Multi-Relational Environment: Cluster Dispersion vs Cluster Purity [C] . Rayner Alfred, Dimitar Kazakov IEEE Workshop on Intelligent Data Acquisition and Advanced Computing Systems . 2007

机译：处理多关系环境中的数据集：群集色散与群集纯度
5. Supervised precision ordinal clustering – A human-machine learning algorithm to create accurate clusters in big datasets: Application to indiana water quality data with novel visualization techniques [D] . Singh, Sarabjit 2014

机译：有监督的有序序数聚类–一种人机学习算法，可在大型数据集中创建准确的聚类：采用新颖的可视化技术应用于印第安纳州水质数据
6. Review of methods for handling confounding by cluster and informative cluster size in clustered data [O] . Shaun Seaman, Menelaos Pavlou, Andrew Copas -1

机译：综述了处理聚类数据中的聚类和信息性聚类大小的混淆方法
7. A Novel Clustering-based Class-asociation Rule Mining Method for Handling Class-Imbalanced Datasets [O] . 2020

机译：一种用于处理类别的基于聚类的类关联规则挖掘方法，用于处理类别 - 不平衡数据集

Handling Datasets in a Multi-Relational Environment: Cluster Dispersion vs Cluster Purity

摘要

著录项

相似文献

相关主题

期刊订阅