首页> 外文OA文献 >Self-adaptive GA, quantitative semantic similarity measures and ontology-based text clustering
【2h】

Self-adaptive GA, quantitative semantic similarity measures and ontology-based text clustering

机译:自适应遗传算法,定量语义相似性度量和基于本体的文本聚类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

As the common clustering algorithms use vector space model (VSM) to represent document, the conceptual relationships between related terms which do not co-occur literally are ignored. A genetic algorithm-based clustering technique, named GA clustering, in conjunction with ontology is proposed in this article to overcome this problem. In general, the ontology measures can be partitioned into two categories: thesaurus-based methods and corpus-based methods. We take advantage of the hierarchical structure and the broad coverage taxonomy of Wordnet as the thesaurus-based ontology. However, the corpus-based method is rather complicated to handle in practical application. We propose a transformed latent semantic analysis (LSA) model as the corpus-based method in this paper. Moreover, two hybrid strategies, the combinations of the various similarity measures, are implemented in the clustering experiments. The results show that our GA clustering algorithm, in conjunction with the thesaurus-based and the LSA-based method, apparently outperforms that with other similarity measures. Moreover, the superiority of the GA clustering algorithm proposed over the commonly used k-means algorithm and the standard GA is demonstrated by the improvements of the clustering performance.
机译:由于常见的聚类算法使用向量空间模型(VSM)表示文档,因此忽略了在字面上未同时出现的相关术语之间的概念关系。为了克服这个问题,本文提出了一种基于遗传算法的聚类技术,即GA聚类和本体。通常,本体度量可分为两类:基于同义词库的方法和基于语料库的方法。我们利用Wordnet的层次结构和广泛的分类法作为基于同义词库的本体。但是,基于语料库的方法在实际应用中处理起来相当复杂。本文提出一种转换后的潜在语义分析(LSA)模型作为基于语料库的方法。此外,在聚类实验中实现了两种混合策略,即各种相似性度量的组合。结果表明,与基于同义词库和基于LSA的方法相结合,我们的GA聚类算法明显优于其他相似性度量。此外,通过改进聚类性能,证明了所提出的GA聚类算法优于常用的k均值算法和标准GA。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号