首页> 外文会议>Information Retrieval Technology >Clustering Deep Web Databases Semantically
【24h】

Clustering Deep Web Databases Semantically

机译:语义化群集深度Web数据库

获取原文

摘要

Deep Web database clustering is a key operation in organizing Deep Web resources. Cosine similarity in Vector Space Model (VSM) is used as the similarity computation in traditional ways. However it cannot denote the semantic similarity between the contents of two databases. In this paper how to cluster Deep Web databases semantically is discussed. Firstly, a fuzzy semantic measure, which integrates ontology and fuzzy set theory to compute semantic similarity between the visible features of two Deep Web forms, is proposed, and then a hybrid Particle Swarm Optimization (PSO) algorithm is provided for Deep Web databases clustering. Finally the clustering results are evaluated according to Average Similarity of Document to the Cluster Centroid (ASDC) and Rand Index (RI). Experiments show that: 1) the hybrid PSO approach has the higher ASDC values than those based on PSO and K-Means approaches. It means the hybrid PSO approach has the higher intra cluster similarity and lowest inter cluster similarity; 2) the clustering results based on fuzzy semantic similarity have higher ASDC values and higher RI values than those based on cosine similarity. It reflects the conclusion that the fuzzy semantic similarity approach can explore latent semantics.
机译:深度Web数据库集群是组织深度Web资源的关键操作。向量空间模型(VSM)中的余弦相似度以传统方式用作相似度计算。但是,它不能表示两个数据库内容之间的语义相似性。本文讨论了如何在语义上对Deep Web数据库进行集群。首先提出了一种模糊语义测度,将本体和模糊集理论相结合,计算了两个Deep Web表单的可见特征之间的语义相似度,然后为Deep Web数据库聚类提供了一种混合粒子群优化算法。最后,根据文档与聚类质心的平均相似度(ASDC)和兰德指数(RI)评估聚类结果。实验表明:1)混合PSO方法比基于PSO和K-Means方法的ASDC值更高。这意味着混合PSO方法具有较高的集群内相似度和最低的集群间相似度。 2)基于模糊语义相似度的聚类结果比基于余弦相似度的聚类结果具有更高的ASDC值和更高的RI值。它反映了模糊语义相似性方法可以探索潜在语义的结论。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号