首页> 外文期刊>Knowledge and information systems >Learning element similarity matrix for semi-structured document analysis
【24h】

Learning element similarity matrix for semi-structured document analysis

机译:用于半结构化文档分析的学习元素相似度矩阵

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Capturing latent structural and semantic properties in semi-structured documents (e.g., XML documents) is crucial for improving the performance of related document analysis tasks. Structured Link Vector Mode (SLVM) is a representation recently proposed for modeling semi-structured documents. It uses an element similarity matrix to capture the latent relationships between XML elements-the constructing components of an XML document. In this paper, instead of applying heuristics to define the element similarity matrix, we propose to compute the matrix using the machine learning approach. In addition, we incorporate term semantics into SLVM using latent semantic indexing to enhance the model accuracy, with the element similarity learnability property preserved. For performance evaluation, we applied the similarity learning to k-nearest neighbors search and similarity-based clustering, and tested the performance using two different XML document collections. The SLVM obtained via learning was found to outperform significantly the conventional Vector Space Model and the edit-distance-based methods. Also, the similarity matrix, obtained as a by-product, can provide higher-level knowledge on the semantic relationships between the XML elements.
机译:捕获半结构化文档(例如XML文档)中潜在的结构和语义属性对于提高相关文档分析任务的性能至关重要。结构化链接矢量模式(SLVM)是最近提出的用于对半结构化文档进行建模的一种表示形式。它使用元素相似度矩阵来捕获XML元素之间的潜在关系,这些元素是XML文档的构成组件。在本文中,我们不使用启发式方法来定义元素相似性矩阵,而是建议使用机器学习方法来计算矩阵。此外,我们使用潜在语义索引将术语语义纳入SLVM,以提高模型的准确性,同时保留了元素相似性可学习性。为了进行性能评估,我们将相似性学习应用于k近邻搜索和基于相似性的聚类,并使用两个不同的XML文档集合测试了性能。发现通过学习获得的SLVM明显优于常规向量空间模型和基于编辑距离的方法。同样,作为副产品获得的相似性矩阵可以提供有关XML元素之间的语义关系的高级知识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号