首页> 外文会议>International Conference on Intelligent and Adaptive Systems and Software Engineering >Vertical Set Square Distance Based Clustering without Prior Knowledge of K
【24h】

Vertical Set Square Distance Based Clustering without Prior Knowledge of K

机译:垂直设定方形距离基于k的群体K.

获取原文

摘要

Clustering is automated identification of groups of objects based on similarity. In clustering two major research issues are scalability and the requirement of domain knowledge to determine input parameters. Most approaches suggest the use of sampling to address the issue of scalability. However, sampling does not guarantee the best solution and can cause significant loss in accuracy. Most approaches also require the use of domain knowledge, trial and error techniques, or exhaustive searching to figure out the required input parameters. In this paper we introduce a new clustering technique based on the set square distance. Cluster membership is determined based on the set squared distance to the respective cluster. As in the case of mean for k-means and median for k-medoids, the cluster is represented by the entire cluster of points for each evaluation of membership. The set square distance for all n items can be computed efficiently in O(n) using a vertical data structure and a few pre-computed values. Special ordering of the set square distance is used to break the data into the "natural" clusters compared to the need of a known k for k-means or k-medoids type of partition clustering. Superior results are observed when the new clustering technique is compared with the classical k-means clustering. To prove the cluster quality and the resolution of the unknown k, data sets with known classes such as the iris data, the uci_kdd network intrusion data, and synthetic data are used. The scalability of the proposed technique is proved using a large RSI data set.
机译:群集是基于相似性的自动识别对象组。在聚类中,两个主要的研究问题是可扩展性和域知识要求确定输入参数的要求。大多数方法表明使用采样来解决可扩展性问题。但是,采样不保证最佳解决方案,可造成精确损失。大多数方法还需要使用域知识,试验和错误技术,或穷举搜索来弄清楚所需的输入参数。在本文中,我们介绍了一种基于设定的方形距离的新集群技术。基于与相应群集的集合方形距离确定群集成员资格。如在K-measoids的k均值和中值的情况下,群集由整个成员资格评估的整个点集群代表。可以使用垂直数据结构和一些预计算值在O(n)中有效地计算所有n项的设置方距。与可知k用于K-means或K-meDoids类型的分区聚类的特殊订购,用于将数据分解为“自然”簇中的数据。当新的聚类技术与古典K-Means聚类进行比较时,观察到卓越的结果。为了证明群集质量和未知k的分辨率,使用具有已知类的数据集,例如IRIS数据,UCI_KDD网络入侵数据和合成数据。使用大RSI数据集来证明所提出的技术的可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号