【24h】

Research on link blocks recognition of web pages

机译:网页链路块识别研究

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

The link block is a typical type of block structure of web pages; it is an important research object in the fields of web data mining. Firstly, block and block tree are proposed as the basic concepts of subsequent explorations, and then an approach of building block trees is put forward. Secondly, four rules for link block discrimination and two indicators for recognition results evaluation are put forward based on the concept of block. Finally, a strategy named forward algorithm for discovery of link block (FAD) is proposed and a corresponding experiment with different parameters is performed to verify the strategy. The results show that the FAD can be flexible to achieve recognition of link blocks under different granularity conditions. Concepts and approaches presented in this paper have a good prospect in the fields of web data processing such as advertising block recognition and web content extraction.
机译:链路块是网页的典型类型的块结构; 它是Web数据挖掘字段中的一个重要研究对象。 首先,块和块树被提出为后续探索的基本概念,然后提出建筑块树的方法。 其次,基于块的概念提出了四个用于识别结果评估的链路块歧视和两个指标规则。 最后,提出了一种命名用于发现链路块(FAD)的前向算法的策略,并执行具有不同参数的相应实验以验证策略。 结果表明,在不同的粒度条件下,FAD可以灵活地实现链路块的识别。 本文提出的概念和方法在诸如广告块识别和Web内容提取的Web数据处理领域具有良好的前景。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号