首页> 外文会议>IET International Conference on Smart and Sustainable City and Big Data >A WEBPAGE INFORMATION EXTRACTION METHOD BASED ON GAME THEORY
【24h】

A WEBPAGE INFORMATION EXTRACTION METHOD BASED ON GAME THEORY

机译:基于博弈论的网页信息提取方法

获取原文

摘要

As web 2.0 developing many websites provide information on its own CMS (content management system) especially for news websites. How to extract information from different webpage is becoming more and more popular to research. Many researchers have proposed plenty of methods that can extract valid content adaptively. In this paper we have proposed a method based on game theory to efficiently extract the main text from webpage. We will find the target label by using label game. Our method is consisted of two steps: (a). Filtering the script and style tags in the Webpage, and then dividing entire html page into many blocks by using div tag; (b). extracting features from the blocks and find the Nash equilibrium from game theory matrix. By making plenty of experiments on some websites, it verifies that our model based on game theory is valid and better.
机译:由于Web 2.0开发许多网站,提供有关其自己的CMS(内容管理系统)的信息,尤其是新闻网站。如何从不同网页中提取信息变得越来越流行研究。许多研究人员提出了充足的方法,可以自适应提取有效内容。在本文中,我们提出了一种基于游戏理论的方法,以有效地从网页中提取主要文本。我们将通过使用标签游戏找到目标标签。我们的方法由两个步骤组成:(a)。在网页中过滤脚本和样式标记,然后使用div标签将整个HTML页面划分为许多块; (b)。从块中提取特征并找到博弈论矩阵的纳什均衡。通过在某些网站上进行大量实验,它验证了我们基于游戏理论的模型是有效且更好的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号