首页> 外文期刊>Quality Control, Transactions >Incremental Cluster Validity Indices for Online Learning of Hard Partitions: Extensions and Comparative Study
【24h】

Incremental Cluster Validity Indices for Online Learning of Hard Partitions: Extensions and Comparative Study

机译:用于在线学习的增量群体有效性指数:扩展和比较研究

获取原文
获取原文并翻译 | 示例
           

摘要

Validation is one of the most important aspects of clustering, particularly when the user is designing a trustworthy or explainable system. However, most clustering validation approaches require batch calculation. This is an important gap because of the value of clustering in real-time data streaming and other online learning applications. Therefore, interest has grown in providing online alternatives for validation. This paper extends the incremental cluster validity index (iCVI) family by presenting incremental versions of Calinski-Harabasz (iCH), Pakhira-Bandyopadhyay-Maulik (iPBM), WB index (iWB), Silhouette (iSIL), Negentropy Increment (iNI), Representative Cross Information Potential (irCIP), Representative Cross Entropy (irH), and Conn & x005F;Index (iConn & x005F;Index). This paper also provides a thorough comparative study of correct, under- and over-partitioning on the behavior of these iCVIs, the Partition Separation (PS) index as well as four recently introduced iCVIs: incremental Xie-Beni (iXB), incremental Davies-Bouldin (iDB), and incremental generalized Dunn & x2019;s indices 43 and 53 (iGD43 and iGD53). Experiments were carried out using a framework that was designed to be as agnostic as possible to the clustering algorithms. The results on synthetic benchmark data sets showed that while evidence of most under-partitioning cases could be inferred from the behaviors of the majority of these iCVIs, over-partitioning was found to be a more challenging problem, detected by fewer of them. Interestingly, over-partitioning, rather then under-partitioning, was more prominently detected on the real-world data experiments within this study. The expansion of iCVIs provides significant novel opportunities for assessing and interpreting the results of unsupervised lifelong learning in real-time, wherein samples cannot be reprocessed due to memory and/or application constraints.
机译:验证是聚类最重要的方面之一,特别是当用户设计值得信赖或可解释的系统时。但是,大多数聚类验证方法都需要批量计算。这是一个重要的缺口,因为在实时数据流和其他在线学习应用程序中的聚类价值。因此,利息已经在提供验证的在线替代方面。本文通过呈现Calinski-Harabasz(ICH),Pakhira-Bandyopadhyay-Maulik(IPBM),WB指数(IWB),剪影(ISIL),上对应增量(INI)来扩展增量群集有效性指数(ICVI)系列系列系列代表性交叉信息潜力(IRCIP),代表性交叉熵(IRH),以及Conn&x005f;索引(iconn&x005f;索引)。本文还提供了对这些ICVIS的行为的正确,下划分的彻底的比较研究,分区分离(PS)指数以及四个最近引入的ICVIS:增量Xie-Beni(IXB),增量戴维斯 - BOULDEN(IDB)和增量通用DUNN&X2019; S索引43和53(IGD43和IGD53)。使用框架进行实验,该框架被设计为尽可能不可知的聚类算法。合成基准数据集的结果表明,虽然可以从大多数划分情况下推断出大多数划分的案件的证据,但发现过度分区是一个更具挑战性的问题,而不是较少的问题。有趣的是,在本研究中的真实数据实验中,更突出地检测到过度分区,而不是划分的划分。 ICVIS的扩展为实时评估和解释无监督终身学习的结果提供了重要的新机遇,其中由于内存和/或应用约束,不能再处理样品。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号