首页> 外文期刊>Chemometrics and Intelligent Laboratory Systems >Clustering algorithm for mixed datasets using density peaks and Self-Organizing Generative Adversarial Networks
【24h】

Clustering algorithm for mixed datasets using density peaks and Self-Organizing Generative Adversarial Networks

机译:利用密度峰和自组织生成对抗网络的混合数据集聚类算法

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a new Density-Peaks and Self-Organizing Generative Adversarial Networks (DP-SO-GAN) for clustering mixed datasets. Many clustering methods depend on the assumption that datasets contain either categorical or numerical attributes. Nevertheless, in real-time, most of the applications include mixed categorical and numerical attributes. In medicine, the clustering of cardiovascular disease is an essential task. The clustering of such data attributes is a vital and challenging issue. First, we transform mixed data attributes such as categorical attributes using a one-hot encoding technique and numerical attributes using normalization techniques. The converted characteristics are input to a Self-Organizing Generative Adversarial Networks (SO-GAN) to learn the feature map. Second, we train two kernel networks, such as the generator and discriminator, and each one holds a trivial amount of convolution kernels. Last, we propose an enhanced density peaks clustering algorithm and computing similarity measure between the data objects in the feature representation. The clustering accuracy for the cardiovascular disease dataset results in 88.32% with a standard deviation of 0.1 and is relatively higher than that of other existing algorithms. The training time for hand-written digits datasets over 300 epochs is 3148.26 s. Experiment results obtained on a set of five datasets demonstrate the merits of the proposed method, especially in terms of the stability and efficiency of network training. The computational complexity of the proposed method in terms of floating-point operations is reduced by around 18% as compared with the classical generative adversarial networks.
机译:本文介绍了一种新的密度峰和自组织生成的对抗网络(DP-SO-GaN),用于聚类混合数据集。许多聚类方法取决于数据集包含分类或数值属性的假设。然而,实时,大多数应用程序包括混合分类和数值属性。在医学中,心血管疾病的聚类是必不可少的任务。这种数据属性的聚类是一个重要和有挑战性的问题。首先,我们使用使用归一化技术的单热编码技术和数值来转换混合数据属性,例如分类属性。转换的特性输入到自组织生成的对抗网络(SO-GAN)以学习特征图。其次,我们训练两个内核网络,例如发电机和鉴别器,每个内核网络都持有一定量的卷积核。最后,我们提出了增强的密度峰值聚类算法和特征表示中的数据对象之间的计算相似度测量。心血管疾病数据集的聚类精度导致88.32%,标准偏差为0.1,相对高于其他现有算法。 300时代的手写数字数据集的培训时间为3148.26秒。在一组五个数据集上获得的实验结果证明了该方法的优点,特别是在网络培训的稳定性和效率方面。与经典生成的对抗网络相比,在浮点操作方面,所提出的方法的计算复杂性减少了约18%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号