首页> 外文会议>2012 IEEE 12th International Conference on Computer and Information Technology. >Deep Web Repeated Pattern Discovering Based on the Largest Block Strategy
【24h】

Deep Web Repeated Pattern Discovering Based on the Largest Block Strategy

机译:基于最大块策略的深度Web重复模式发现

获取原文
获取原文并翻译 | 示例

摘要

Repeated pattern is a common phenomenon in query result pages of deep web sites. The deep web back-end data can be accessed by mining repeated patterns. So far, most of the algorithms of discovering repeated pattern use traditional web information extraction methods. But the recall percentage and accuracy are not high. How to obtain the repeated pattern accurately and completely is still a difficulty. We propose a method based on the largest block strategy to discover such pattern. The core of the method is using the largest block strategy to discover the repeated pattern layer. We can quickly navigate to the region of the entity data, and then analyze the sub tree in this area, finally, get the simplified repeated pattern of the deep web site. According to the results of the experiment, this method can get the repeated pattern data more accurately and more completely than the traditional methods. It can also address the multi-pattern problem which has not been solved yet in other methods.
机译:重复模式是深层网站的查询结果页面中的常见现象。可以通过挖掘重复的模式来访问深层Web后端数据。到目前为止,发现重复模式的大多数算法都使用传统的Web信息提取方法。但是召回率和准确性不高。如何准确,完整地获得重复图案仍然是一个难题。我们提出了一种基于最大块策略的方法来发现这种模式。该方法的核心是使用最大的块策略来发现重复的图案层。我们可以快速导航到实体数据的区域,然后分析该区域中的子树,最后,获得深度网站的简化重复模式。根据实验结果,该方法可以比传统方法更准确,更完整地获得重复的图案数据。它还可以解决在其他方法中尚未解决的多模式问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号