...
首页> 外文期刊>Data & Knowledge Engineering >Exploiting structural similarity for effective Web information extraction
【24h】

Exploiting structural similarity for effective Web information extraction

机译:利用结构相似性来有效地提取Web信息

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we propose a classification technique for Web pages, based on the detection of structural similarities among semistructured documents, and devise an architecture exploiting such technique for the purpose of information extraction. The proposal significantly differs from standard methods based on graph-matching algorithms, and is based on the idea of representing the structure of a document as a time series in which each occurrence of a tag corresponds to an impulse. The degree of similarity between documents is then stated by analyzing the frequencies of the corresponding Fourier transform. Experiments on real data show the effectiveness of the proposed technique.
机译:在本文中,我们基于对半结构化文档之间结构相似性的检测,提出了一种网页分类技术,并设计了一种利用该技术进行信息提取的体系结构。该提议与基于图匹配算法的标准方法有很大不同,并且基于将文档的结构表示为时间序列的思想,在该时间序列中,每次出现的标签都对应于一个脉冲。然后,通过分析相应的傅立叶变换的频率来说明文档之间的相似度。在真实数据上的实验表明了该技术的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号