首页> 外文会议>International Computer Symposium >An Effective Clustering Mechanism for Uncertain Data Mining Using Centroid Boundary in UKmeans
【24h】

An Effective Clustering Mechanism for Uncertain Data Mining Using Centroid Boundary in UKmeans

机译:用乌克西集中边界的不确定数据挖掘有效聚类机制

获取原文

摘要

Object errors affect the time cost and effectiveness in uncertain data clustering. For decreasing the time cost and increasing the effectiveness, we propose two mechanisms for the centroid based clustering, UKmeans. The first mechanism is an improved similarity. Similarity is an intuitive factor that immediately affects the time cost and effectiveness. For example, similarity calculations with integration focus on the effectiveness of clustering but ignore the time cost. On the contrary, the similarity calculations by simplified approaches address on the issue of time cost but ignore the effectiveness. In this study, for considering both the time cost and effectiveness, we use a simplified similarity for reducing the time cost, and add additional two factors, namely intersection and density of clusters, to increase the effectiveness of clustering. The former factor can increase the degree of the object belongingness when a cluster overlaps the object. The latter factor can avoid objects to be attracted by clusters which have large errors. The other proposed mechanism is the definition of the centroid boundary. In clustering, the position of a cluster centroid is in an average range which contributes from the belonging objects' errors. However, the large average range causes the low effectiveness of clustering. For decreasing the range, we propose the square root boundary mechanism to limit the upper bound of possible positions of centroids to increase the effectiveness of clustering. In experiments, the results suggest that our two mechanisms work well in the time cost and effectiveness and these two mechanisms complete the UKmeans approaches in uncertain data clustering.
机译:对象错误会影响不确定数据聚类的时间成本和有效性。为了降低时间成本并提高有效性,我们提出了两个基于质心的聚类机制,尤克里·群岛。第一机制是一种改进的相似性。相似性是一种直观的因素,即立即影响时间成本和有效性。例如,具有集成的相似性计算侧重于聚类的有效性,但忽略时间成本。相反,通过简化的方法来解决时间成本问题但忽略了效力的相似性计算。在本研究中,为了考虑时间成本和有效性,我们使用简化的相似性来减少时间成本,并增加额外的两个因素,即簇的交叉点和密度,以提高聚类的有效性。当群集与对象重叠时,前一个因素可以增加对象属性的程度。后一因素可以避免对具有大错误的集群吸引物体。其他提出的机制是质心边界的定义。在聚类中,群集质心的位置处于平均范围,从属于归属对象的错误。但是,较大的平均范围导致聚类的有效性低。为了降低范围,我们提出了平方根边界机制,以限制质心的可能位置的上限,以提高聚类的有效性。在实验中,结果表明,我们的两种机制在时间成本和有效性良好工作,这两个机制在不确定的数据聚类中完成了UKEMENS方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号