首页> 外文期刊>Journal of Intelligent Information Systems >An Enhanced Web Page Change Detection Approach Based On Limiting Similarity Computations To Elements Of Same Type
【24h】

An Enhanced Web Page Change Detection Approach Based On Limiting Similarity Computations To Elements Of Same Type

机译:一种基于相似性限制相似类型元素的增强的网页变化检测方法

获取原文
获取原文并翻译 | 示例
       

摘要

This paper describes an efficient Web page detection approach based on restricting the similarity computations between two versions of a given Web page to the nodes with the same HTML tag type. Before performing the similarity computations, the HTML Web page is transformed into an XML-like structure in which a node corresponds to an open-closed HTML tag. Analytical expressions and supporting experimental results are used to quantify the improvements that are made when comparing the proposed approach to the traditional one, which computes the similarities across all nodes of both pages. It is shown that the improvements are highly dependent on the diversity of tags in the page. That is, the more diverse the page is (i.e., contains mixed content of text, images, links, etc.), the greater the improvements are, while the more uniform it is, the lesser they are.
机译:本文介绍了一种有效的网页检测方法,该方法基于将给定网页的两个版本之间的相似度计算限制在具有相同HTML标记类型的节点上。在执行相似度计算之前,HTML网页被转换为类似XML的结构,其中一个节点对应于一个打开-关闭的HTML标签。分析表达式和支持的实验结果用于量化将建议的方法与传统方法进行比较时所做的改进,该方法计算出两个页面所有节点之间的相似度。结果表明,改进很大程度上取决于页面中标签的多样性。也就是说,页面越多样化(即包含文本,图像,链接等的混合内容),改进的程度就越大,而改进的程度越均匀,它们的效果就越差。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号