...
首页> 外文期刊>Data & Knowledge Engineering >Mechanisms to improve clustering uncertain data with UKmeans
【24h】

Mechanisms to improve clustering uncertain data with UKmeans

机译:使用UKmeans改善不确定数据聚类的机制

获取原文
获取原文并翻译 | 示例
           

摘要

Uncertain data inKmeans clustering, namelyUKmeans, have been discussed in decade years.UKmeans clustering, however, has some difficulties of time performance and effectiveness because of the uncertainty of objects. In this study, we propose some modifiedUKmeans clustering mechanisms to improve the time performance and effectiveness, and to enable the clustering to be more complete. The main issues include (1) reducing the consideration of time performance in clustering, (2) increasing the effectiveness of clustering, and (3) considering the determination of the number of clusters. In time performance, we use simplified object expressions to reduce the time spent in comparing similarities. Regarding the effectiveness of clustering, we propose compounded factors including the distance, the overlapping of clusters and objects, and the cluster density as the clustering standard to determine similarity. In addition, to increase the effectiveness of clustering, we also propose the concept of a cluster boundary, which affects the belongingness of an object by the overlapping factor. Finally, we use the evaluating approach of the number of uncertain clusters to determine the appropriate the number of clusters. In the experiment, clustering results generated using strategies commonly used in processing uncertain data clustering inUKmeans clusters are compared. Our proposed model shows more favorable performance, higher effectiveness of clustering, and a more appropriate number of clusters compared to other models.
机译:十年以来,已经讨论了Kmeans聚类中不确定的数据,即UKmeans。但是,UKmeans聚类由于对象的不确定性而在时间性能和有效性方面存在一些困难。在这项研究中,我们提出了一些经过改进的UKmeans聚类机制,以提高时间性能和有效性,并使聚类更加完整。主要问题包括(1)减少对聚类中时间性能的考虑;(2)提高聚类的有效性;(3)考虑确定聚类数。在时间性能方面,我们使用简化的对象表达式来减少比较相似度所花费的时间。关于聚类的有效性,我们提出了距离,聚类与对象的重叠以及聚类密度等复合因素作为确定相似性的聚类标准。另外,为了提高聚类的有效性,我们还提出了聚类边界的概念,该聚类边界会通过重叠因子影响对象的归属性。最后,我们使用不确定簇数的评估方法来确定合适的簇数。在实验中,比较了使用通常用于处理UKmeans群集中的不确定数据聚类的策略生成的聚类结果。与其他模型相比,我们提出的模型显示出更佳的性能,更高的聚类效率以及更合适的聚类数量。

著录项

  • 来源
    《Data & Knowledge Engineering》 |2018年第7期|61-79|共19页
  • 作者单位

    Department of Computer Science and Information Engineering, National Taipei University of Technology;

    School of Computer Science and Technology, Beijing Institute of Technology;

    Department of Computer Science and Information Engineering, National Taipei University of Technology;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Uncertain data; Clustering; Centroid boundary;

    机译:不确定数据;聚类;质心边界;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号