首页> 外文会议>IEEE/WIC/ACM International Conference on Web Intelligence >Integrating element and term semantics for similarity-based XML document clustering
【24h】

Integrating element and term semantics for similarity-based XML document clustering

机译:集成基于相似性的XML文档群集的元素和术语语义

获取原文

摘要

Structured link vector model (SLVM) is a recently proposed document representation that takes into account both structural and semantic information for measuring XML document similarity. Its formulation includes an element similarity matrix for capturing the semantic similarity between XML elements - the structural components of XML documents. In this paper, instead of applying heuristics to define the similarity matrix, we proposed to learn the matrix using pair wise similar training data in an iterative manner. In addition, we extended SLVM to SLVM-LSI by incorporating term semantics into SLVM using latent semantic indexing, with the element similarity related properties of the original SLVM preserved. For performance evaluation, we applied SLVM-LSI to similarity-based clustering of two XML datasets and the proposed SLVM-LSI was found to significantly outperform the conventional vector space model and the edit-distance based methods. The similarity matrix, obtained as a byproduct via the learning, can provide higher level knowledge about the semantic relationship between the XML elements.
机译:结构化链接矢量模型(SLVM)是最近提出的文档表示,以考虑用于测量XML文档相似性的结构和语义信息。其配方包括用于捕获XML元素之间的语义相似性的元素相似性矩阵 - XML文档的结构组件。在本文中,代替应用启发式来定义相似性矩阵,我们建议使用成对类似的训练数据以迭代方式学习矩阵。此外,我们通过使用潜在语义索引将术语语义与SLVM合并到SLVM中,将SLVM扩展到SLVM-LSI,其中包含原始SLVM的元素相似性相关属性。对于性能评估,我们将SLVM-LSI应用于两个XML数据集的相似性的聚类,并且发现所提出的SLVM-LSI显着优于传统的矢量空间模型和基于编辑距离的方法。通过学习获得的相似性矩阵作为副产品,可以提供关于XML元素之间的语义关系的更高级别的知识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号