首页> 外文期刊>Information Sciences: An International Journal >Fuzzy control GA with a novel hybrid semantic similarity strategy for text clustering
【24h】

Fuzzy control GA with a novel hybrid semantic similarity strategy for text clustering

机译:带有新型混合语义相似度策略的文本聚类模糊控制遗传算法

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

This paper proposes a fuzzy control genetic algorithm (GA) in conjunction with a novel hybrid semantic similarity measure for document clustering. Since the common clustering algorithms use vector space model (VSM) to represent document, the conceptual relationships between related terms being ignored, we use semantic similarity measures to solve this problem. In general, the semantic similarity measures can be extensively categorized into two kinds: thesaurus-based methods and corpus-based methods. However, in practice the corpus-based method is rather complicated to tackle. We propose and demonstrate a semantic space model (SSM) as the corpus-based method, where the appropriately reduced dimensions in SSM can capture the true relationship between documents in terms of concepts, rather than specific terms. Thus, the thesaurus-based method is combined with our SSM as a hybrid strategy to represent the semantic similarity measure. In GA field, the balance between the capability to converge to an optimum and the capacity to explore new solutions affects the success of search for the global optimum. We utilize a fuzzy control GA to adaptively adjust the influence between these two factors. Two textual data sets from Reuter document collection and 20-newsgroup corpus are tested in our experiments, and the results show that our fuzzy control GA combined with the hybrid semantic similarity strategy apparently outperforms the conventional GA, FCM and K-means with the traditional cosine similarity in VSM. Moreover, the superiorities of the fuzzy control GA and our hybrid semantic strategy are demonstrated by their better performance, in comparison with conventional GA with the same similarity measures.
机译:提出了一种模糊控制遗传算法(GA),并结合一种新颖的混合语义相似度度量进行文档聚类。由于常见的聚类算法使用向量空间模型(VSM)表示文档,相关术语之间的概念关系被忽略,因此我们使用语义相似性度量来解决此问题。通常,语义相似性度量可以大致分为两种:基于同义词库的方法和基于语料库的方法。但是,实际上,基于语料库的方法很难解决。我们提出并演示了一种语义空间模型(SSM)作为基于语料库的方法,其中SSM中适当缩小的维度可以按照概念而非特定术语来捕获文档之间的真实关系。因此,基于词库的方法与我们的SSM相结合作为一种混合策略来表示语义相似性度量。在遗传算法领域中,收敛到最佳能力和探索新解决方案的能力之间的平衡会影响寻找全局最优的成功。我们利用模糊控制GA自适应地调整这两个因素之间的影响。实验中测试了来自Reuter文档收集和20个新闻组语料的两个文本数据集,结果表明,结合混合语义相似性策略的模糊控制GA明显优于传统的GA,FCM和K-means。 VSM的相似性。此外,与具有相同相似性度量的常规GA相比,模糊控制GA和我们的混合语义策略的优越性表现出了其更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号