首页> 外文会议>IEEE International Conference on Data Mining Workshops >New Quality Indexes for Optimal Clustering Model Identification with High Dimensional Data
【24h】

New Quality Indexes for Optimal Clustering Model Identification with High Dimensional Data

机译:高维数据的最优聚类模型识别的新质量指标

获取原文

摘要

Feature maximization is an alternative measure to usual distributional measures relying on entropy or on Chi-square metric or vector-based measures such as Euclidean distance or correlation distance. One of the key advantages of this measure is that it is operational in an incremental mode both on clustering and on traditional classification. In the classification framework, it does not present the limitations of the aforementioned measures in the case of the processing of highly unbalanced, heterogeneous and highly multidimensional data. We shall present a new application of this measure in the clustering context for the creation of new cluster quality indexes which can be efficiently applied for a low-to-high dimensional range of data and which are tolerant to noise. We shall compare the behavior of these new indexes with usual cluster quality indexes based on Euclidean distance on different kinds of test datasets for which ground truth is available. This comparison clearly highlights the superior accuracy and stability of the new method.
机译:特征最大化是替代衡量尺寸依赖于熵或基于Chi-Square公制或基于欧几里德距离或相关距离的措施的常规分配措施。这项措施的关键优势之一是它在聚类和传统分类上以增量模式运行。在分类框架中,在处理高度不平衡,异构和高度多维数据的情况下,它不会呈现上述措施的局限性。我们将在集群环境中展示该措施的新应用,以创建新的群集质量索引,这可以有效地应用于低到高维度的数据范围,并且耐受噪声。我们将根据不同类型的测试数据集上的欧几里德距离,将这些新索引的行为与通常的欧几里德距离进行比较。这种比较显然突出了新方法的卓越准确性和稳定性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号