首页> 外文会议>Progress in WWW Research and Development >Similarity Computation for XML Documents by XML Element Sequence Patterns
【24h】

Similarity Computation for XML Documents by XML Element Sequence Patterns

机译:XML元素序列模式对XML文档的相似度计算

获取原文
获取原文并翻译 | 示例

摘要

Measuring the similarity between XML documents is the fundamental task of finding clusters in XML documents collection. In this paper, XML document is modeled as XML Element Sequence Pattern (XESP) and XESP can be extracted using less time and space than extracing other models such as tree model and frequent paths model. Similarity between XML documents will be measured based on XESPs. In view of the deficiencies encountered by ignoring the hierarchical information in frequent paths pattern models and semantic information in tree models, semantics of the elements and the hierarchical structure of the document will be taken into account when computing the similarity between XML documents by XESPs. Experimental results show that perfect clustering will be obtained with proper threshold of similarity computed by XESPs.
机译:测量XML文档之间的相似性是在XML文档集合中查找集群的基本任务。在本文中,将XML文档建模为XML元素序列模式(XESP),并且与使用其他模型(例如树模型和频繁路径模型)相比,可以使用更少的时间和空间来提取XESP。 XML文档之间的相似性将基于XESP进行衡量。鉴于忽略频繁路径模式模型中的层次结构信息和树模型中的语义信息所遇到的缺陷,当通过XESP计算XML文档之间的相似性时,将考虑元素的语义和文档的层次结构。实验结果表明,通过XESP计算出的适当的相似度阈值,可以获得完美的聚类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号