首页> 外文期刊>Data & Knowledge Engineering >A Fast Html Web Page Change Detection Approach Based On Hashing And Reducing The Number Of Similarity Computations
【24h】

A Fast Html Web Page Change Detection Approach Based On Hashing And Reducing The Number Of Similarity Computations

机译:一种基于散列并减少相似度计算次数的HTML网页快速变化检测方法

获取原文
获取原文并翻译 | 示例
       

摘要

This paper describes a fast HTML web page detection approach that saves computation time by limiting the similarity computations between two versions of a web page to nodes having the same HTML tag type, and by hashing the web page in order to provide direct access to node information. This efficient approach is suitable as a client application and for implementing server applications that could serve the needs of users in monitoring modifications to HTML web pages made over time, and that allow for reporting and visualizing changes and trends in order to gain insight about the significance and types of such changes. The detection of changes across two versions of a page is accomplished by performing similarity computations after transforming the web page into an XML-like structure in which a node corresponds to an open-close HTML tag. Performance and detection reliability results were obtained, and showed speed improvements when compared to the results of a previous approach.
机译:本文介绍了一种快速的HTML网页检测方法,该方法通过将两个版本的网页之间的相似性计算限制到具有相同HTML标记类型的节点,并通过对网页进行哈希处理来提供对节点信息的直接访问,从而节省了计算时间。 。这种有效的方法适合作为客户端应用程序和服务器应用程序,这些服务器应用程序可以满足用户在监视随时间推移对HTML网页进行的修改方面的需求,并允许报告和可视化更改和趋势,从而了解其重要性。以及此类更改的类型。通过在将网页转换为XML类型的结构(其中一个节点对应于一个开闭HTML标记)之后执行相似度计算,可以检测页面的两个版本之间的更改。获得了性能和检测可靠性结果,与以前的方法相比,结果显示出速度的提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号