首页> 外文会议>2011 International Conference on Management of e-Commerce and e-Government >Clustering XML Search Results Based on Content and Structure Similarity
【24h】

Clustering XML Search Results Based on Content and Structure Similarity

机译:基于内容和结构相似性的XML搜索结果聚类

获取原文

摘要

Clustering XML search results is an effective way to improve performance. However, the key problem is how to measure similarity between XML documents. In this paper, we propose a semantic similarity measure method combining content with structure, in which a variety of XML document features, including term element frequency, term inverse element frequency, semantic weight of tag label and level information of the term, are analyzed and applied for computing the similarity between XML documents. In addition, two new performance evaluation methodology, namely Cluster Ratio Relevant and Docu Ratio Relevant, for clustering quality are introduced motivated by the observations of relevant documents distribution and the fact that collection has no classification information. Experiment results show that proposed similarity method(CAS measure)outperforms traditional document clustering(CO measure) in Cluster Ratio Relevant and Docu Ratio Relevant and produces better clustering quality.
机译:群集XML搜索结果是提高性能的有效方法。但是,关键问题是如何测量XML文档之间的相似性。在本文中,我们提出了一种与结构结合内容的语义相似度测量方法,其中分析了各种XML文档特征,包括术语元素频率,术语逆元素频率,标签标签的语义权重和术语的级别信息,应用于计算XML文档之间的相似性。此外,通过有关文件分布的观察和收集没有分类信息,引入了两种新的性能评估方法,即群集比率和相关和文档比率相关的DOCU比率。实验结果表明,所提出的相似性方法(CAS测量)优于传统的文档聚类(CO测量)在群集比中相关和DOCU比率相关,并产生更好的聚类质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号