首页> 外文会议>International Conference on Communication and Electronics Systems >Relevant Data Node Extraction:A Web Data Extraction Method for Non Contagious Data
【24h】

Relevant Data Node Extraction:A Web Data Extraction Method for Non Contagious Data

机译:相关数据节点提取:一种非传染性数据的Web数据提取方法

获取原文

摘要

The Internet is expanding rapidly and millions of HTML pages are created daily. These HTML pages are created by content management systems like Wordpress, Joomla or by other software programs. This software programs query data from single or multiple associated databases & then fill the template with data in web pages to get well-structured data and call this well-structured data as data nodes. This paper proposes a novel technique to detect and extract structured data from web pages. These data nodes are very vital since they provide information about all the structured data. A data extraction technique Relevant Data Node Extraction (RDNE) that automatically mine relevant data nodes from HTML pages is presented. The algorithm in this paper is based on some set of rules that are observed & implemented. Our approach showed excellent results for the proposed technique.
机译:互联网正在迅速发展,每天创建数百万个HTML页面。这些HTML页面是由诸如Wordpress,Joomla之类的内容管理系统或由其他软件程序创建的。该软件程序可以从单个或多个关联的数据库中查询数据,然后在网页中用数据填充模板以获取结构良好的数据,并将此结构良好的数据称为数据节点。本文提出了一种从网页中检测和提取结构化数据的新技术。这些数据节点非常重要,因为它们提供有关所有结构化数据的信息。提出了一种数据提取技术“相关数据节点提取(RDNE)”,该技术可自动从HTML页面中挖掘相关数据节点。本文中的算法基于观察和实施的一组规则。我们的方法对所提出的技术显示出了极好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号