...
首页> 外文期刊>Data & Knowledge Engineering >Generate pairwise constraints from unlabeled data for semi-supervised clustering
【24h】

Generate pairwise constraints from unlabeled data for semi-supervised clustering

机译:从未标记的数据生成成对约束以进行半监督聚类

获取原文
获取原文并翻译 | 示例
           

摘要

Pairwise constraint selection methods often rely on the label information of data to generate pairwise constraints. This paper proposes a new method of selecting pairwise constraints from unlabeled data for semi-supervised clustering to improve clustering accuracy. Given a dataset without any label information, it is first clustered by using the I-nice method into a set of initial clusters. From each initial cluster, a dense group of objects is obtained by removing the faraway objects. Then, the most informative object and the informative objects are identified with the local density estimation method in each dense group of objects. The identified objects are used to form a set of pairwise constraints, which are incorporated in the semi-supervised clustering algorithm to guide the clustering process toward a better solution. The advantage of this method is that no label information of data is required for selection pairwise constraints. Experimental results demonstrate that the new method improved the clustering accuracy and outperformed four state-of-the-art pairwise constraint selection methods, namely, random, FFQS, min-max, and NPU, on both synthetic and real-world datasets.
机译:成对约束选择方法通常依赖于数据的标签信息来生成成对约束。提出了一种新的从半连续聚类的未标记数据中选择成对约束的方法,以提高聚类的准确性。给定一个没有标签信息的数据集,首先使用I-nice方法将其聚集成一组初始聚类。通过删除远处的对象,从每个初始群集中获得密集的对象组。然后,使用局部密度估计方法在每个密集对象组中识别信息最多的对象和信息对象。所识别的对象用于形成一组成对约束,将其合并到半监督聚类算法中,以指导聚类过程朝更好的方向发展。该方法的优点是选择成对约束不需要数据的标签信息。实验结果表明,该新方法在合成数据集和实际数据集上均提高了聚类精度,并且优于四种最新的成对约束选择方法,即随机,FFQS,最小-最大和NPU。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号