首页> 外文会议>2016 International Computer Symposium >An Effective Clustering Mechanism for Uncertain Data Mining Using Centroid Boundary in UKmeans
【24h】

An Effective Clustering Mechanism for Uncertain Data Mining Using Centroid Boundary in UKmeans

机译:UKmeans中使用质心边界进行不确定数据挖掘的有效聚类机制

获取原文
获取原文并翻译 | 示例

摘要

Object errors affect the time cost and effectiveness in uncertain data clustering. For decreasing the time cost and increasing the effectiveness, we propose two mechanisms for the centroid based clustering, UKmeans. The first mechanism is an improved similarity. Similarity is an intuitive factor that immediately affects the time cost and effectiveness. For example, similarity calculations with integration focus on the effectiveness of clustering but ignore the time cost. On the contrary, the similarity calculations by simplified approaches address on the issue of time cost but ignore the effectiveness. In this study, for considering both the time cost and effectiveness, we use a simplified similarity for reducing the time cost, and add additional two factors, namely intersection and density of clusters, to increase the effectiveness of clustering. The former factor can increase the degree of the object belongingness when a cluster overlaps the object. The latter factor can avoid objects to be attracted by clusters which have large errors. The other proposed mechanism is the definition of the centroid boundary. In clustering, the position of a cluster centroid is in an average range which contributes from the belonging objects' errors. However, the large average range causes the low effectiveness of clustering. For decreasing the range, we propose the square root boundary mechanism to limit the upper bound of possible positions of centroids to increase the effectiveness of clustering. In experiments, the results suggest that our two mechanisms work well in the time cost and effectiveness and these two mechanisms complete the UKmeans approaches in uncertain data clustering.
机译:对象错误会影响不确定的数据聚类中的时间成本和有效性。为了减少时间成本并提高有效性,我们提出了两种基于质心的聚类机制UKmeans。第一种机制是改进的相似性。相似性是一个直观因素,会立即影响时间成本和有效性。例如,具有集成的相似度计算着重于聚类的有效性,但忽略了时间成本。相反,通过简化方法进行的相似度计算解决了时间成本问题,但却忽略了有效性。在这项研究中,为了同时考虑时间成本和有效性,我们使用简化的相似度来降低时间成本,并添加了两个额外的因素,即群集的相交和密度,以提高群集的有效性。当群集与对象重叠时,前一个因素会增加对象的归属程度。后一个因素可以避免对象被具有大误差的群集吸引。提出的另一种机制是质心边界的定义。在聚类中,聚类质心的位置在平均范围内,这归因于所属对象的错误。但是,平均范围大会导致聚类的效率低下。为了减小范围,我们提出了平方根边界机制以限制质心可能位置的上限,以提高聚类的有效性。在实验中,结果表明我们的这两种机制在时间成本和有效性上都运作良好,并且这两种机制完善了UKmeans方法在不确定数据聚类中的作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号