Research of Self-adaptive Web Page Parser based on Templates and Rules

机译：基于模板和规则的自适应网页解析器研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Web pages parsing is a concerned topic in recent years, how to get rid of human intervention and formulate extraction rules of subject information from a large number of web pages at the fastest and most accurate speed has becoming an important research point in this field. This paper proposes a frame of self-adaptive web page parser based on templates and rules. Firstly, it uses the noise filter algorithm to filter irrelevant nodes and invalid nodes, and then combines the ways of page template and heuristic rule to generate extraction rules, at the same time it can adjust extraction rules dynamically according to external factors through automatic detection mechanism. Using this frame to generate parsers has better self-adaptability, being able to generate extraction rules better, and being able to locate and extract subject information better. The experimental result shows the effectiveness of the parser.

机译：网页解析是近年来有关的主题，如何以最快，最准确的速度从大量网页摆脱人为干预并制定主题信息的提取规则，成为该领域的重要研究点。本文提出了一种基于模板和规则的自适应网页解析器的框架。首先，它使用噪声滤波器算法来过滤无关的节点和无效节点，然后将页面模板和启发式规则的方式组合以产生提取规则，同时它可以通过自动检测机制根据外部因素动态调整提取规则。使用该帧生成解析器具有更好的自适应性，能够更好地生成提取规则，并且能够更好地定位和提取主题信息。实验结果表明了解析器的有效性。

著录项

来源
《International Conference on Management and Service Science》|2009年||共4页
会议地点
作者
Jinzhu Hu; Xing Zhou; Jiangbo Shu; Chunxiu Xiong;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 C93-53;
关键词
Self-adaptive; Web page parser; Templates; Heuristic rules; Information extraction;

机译：自适应;网页解析器;模板;启发式规则;信息提取;
入库时间 2022-08-21 06:24:29

相似文献

外文文献
中文文献
专利

1. A Signal-Representation-Based Parser to Extract Text-Based Information from the Web [J] . Mu-Chun Su, Shao-Jui Wang, Chen-Ko Huang, Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2010,第5a77期

机译：基于信号表示的解析器，用于从Web提取基于文本的信息
2. Excemplify: A Flexible Template Based Solution, Parsing and Managing Data in Spreadsheets for Experimentalists [J] . Lei Shi, Lenneke Jong, Ulrike Wittig, Journal of Integrative Bioinformatics . 2013,第2期

机译：示例：基于灵活模板的解决方案，为实验人员解析和管理电子表格中的数据
3. Excemplify: A Flexible Template Based Solution, Parsing and Managing Data in Spreadsheets for Experimentalists [J] . Lei Shi, Lenneke Jong, Ulrike Wittig, Journal of Integrative Bioinformatics . 2013,第2期

机译：默认：基于模板的灵活模板，解析和管理实验主义者的电子表格中的数据
4. Research of Self-Adaptive Web Page Parser Based on Templates and Rules [C] . Hu, Jinzhu, Zhou, Xing, Shu, Jiangbo, International Conference on Management and Service Science;MASS 2009 . 2009

机译：基于模板和规则的自适应网页解析器的研究
5. Using a named entity tagger and a syntactic parser to improve Web-based answer extraction [D] . Kamel, Yasser. 2004

机译：使用命名实体标记器和语法解析器来改进基于Web的答案提取
6. Ciruvis: a web-based tool for rule networks and interaction detection using rule-based classifiers [O] . Susanne Bornelöv, Simon Marillet, Jan Komorowski 2014

机译：Ciruvis：基于规则的分类器的基于Web的规则网络和交互检测工具
7. Pemanfaatan Website Parser Template Pada Web Crawler Untuk Membangun Metadata Pada Sistem Pencarian Berbasis Semantik [O] . Masthurah, Nurhayati, Wirahman, Taufiq, Munandar, Devi 2008

机译：利用Web爬虫中的网站解析器模板在基于语义的搜索系统中构建元数据

Research of Self-adaptive Web Page Parser based on Templates and Rules

摘要

著录项

相似文献

相关主题

期刊订阅