An Approach of Web Page Information Extraction

机译：网页信息提取的方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Web has become the largest information source, but the noise content is an inevitable part in any web pages. The noise content reduces the nicety of search engine and increases the load of server. Information extraction technology has been developed. Information extraction technology is mostly based on page segmentation. Through analyzed the existing method of page segmentation, an approach of web page information extraction is provided. The block node is identified by analyzing attributes of HTML tags. This algorithm is easy to implementation. Experiments prove its good performance.

机译：Web已成为最大的信息源，但噪声内容是任何网页中的不可避免的部分。噪声内容减少了节约的搜索引擎并增加了服务器的负载。信息提取技术已经开发出来。信息提取技术主要基于页面分段。通过分析了页面分段的现有方法，提供了一种网页信息提取的方法。通过分析HTML标记的属性来识别块节点。该算法易于实现。实验证明了它的良好表现。

著录项

来源
《International Conference on Computer Science and Electronics Engineering》|2013年||共3页
会议地点
作者
Yaohui Li; Lixia Wang; Jianxiong Wang; Jie Yue; Mingzhan Zhao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词
Information extraction; DOM; Page segmentation; HTML tag;

机译：信息提取;DOM;页面分段;HTML标记;

相似文献

外文文献
中文文献
专利

1. Web Data Extraction Approach for Deep Web using WEIDJ [J] . Ily Amalina Ahmad Sabri, Mustafa Man, Wan Aezwani Wan Abu Bakar, Procedia Computer Science . 2019,第1期

机译：使用WEIDJ进行深度Web的Web数据提取方法
2. WPBL: A Webpage Block Labeling Based Approach for Web Information Extraction [J] . Naizhou Zhang, Shijun Li, Zhuo Zhang, Journal of information and computational science . 2010,第1期

机译：WPBL：一种基于网页块标记的Web信息提取方法
3. The Web-OEM approach to Web information extraction [J] . Luca Iocchi Journal of Network and Computer Applications . 1999,第4期

机译：Web OEM提取Web信息的方法
4. Improving Webpage Content Extraction by extending a novel single page extraction approach: A case study with Thai websites [C] . Thanadechteemapat Wigrai, Chun Che Fung ICMLC;International Conference on Machine Learning and Cybernetics . 2012

机译：通过扩展新颖的单页提取方法来改善网页内容提取：以泰国网站为例
5. Improving Web Security by Automated Extraction of Web Application Intent. [D] . Bisht, Prithvi Pal Singh. 2011

机译：通过自动提取Web应用程序意图来提高Web安全性。
6. A Comprehensive Approach Limiting Extractions under General Anesthesia Could Improve Oral Health [O] . Nicolas Decerle, Pierre-Yves Cousson, Emmanuel Nicolas, 2020

机译：全身麻醉下的综合方法限制提取可以改善口腔健康
7. Improving Webpage Content Extraction by extending a novel single page extraction approach: A case study with Thai websites [O] . Thanadechteemapat W., Fung C.C. 2012

机译：通过扩展新颖的单页提取方法来改善网页内容提取：以泰国网站为例

An Approach of Web Page Information Extraction

摘要

著录项

相似文献

相关主题

期刊订阅