An Effective Method to Extract Web Content Information

Pan Suhan; Li Zhiqiang; Dai Juan

首页> 外文期刊>Journal of software >An Effective Method to Extract Web Content Information

【24h】

An Effective Method to Extract Web Content Information

机译：一种提取Web内容信息的有效方法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

To simplify the operation of web text content extraction and improve the accuracy of that, a newextraction method based on text-punctuation distribution and tag features (TPDT) is proposed. Combining the distribution of text-punctuation and tag features. Calculating the text-punctuation density in different text blocks and get the maximum continuoussum of density to extracting the best text content from web pages.The method effectively solves the problem of noisy information filtering and text content extraction without the training and manual processing. Experimental results on web pages randomly selected from different portalwebsites show that the TPDT method has good applicability on various news pages.

机译：为了简化网络文本内容提取的操作并提高其准确性，提出了一种基于文本标点分布和标签特征（TPDT）的提取方法。结合文本标点和标签功能的分布。计算不同文本块中的文本标点密度，得到最大的密度连续值，以从网页中提取最佳文本内容。该方法有效地解决了信息过滤和文本内容提取中的噪声问题，无需培训和人工处理。从不同门户网站随机选择的网页上的实验结果表明，TPDT方法在各种新闻页面上具有良好的适用性。

著录项

来源
《Journal of software》 |2018年第11期|共9页
作者
Pan Suhan; Li Zhiqiang; Dai Juan;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Collegeof Information EngineeringYangzhou UniversityYangzhouChina.;

机译：扬州大学信息工程学院扬州。;
入库时间 2022-08-18 17:09:18

相似文献

外文文献
中文文献
专利

1. An effective and efficient Web content extractor for optimizing the crawling process [J] . Erdinc Uzun, Edip Serdar Guener, Yilmaz Kilicaslan, Software . 2014,第10期

机译：有效和高效的Web内容提取器，用于优化爬网过程
2. Web usage and content mining to extract knowledge for modelling the users of the Bidasoa Turismo website and to adapt it [J] . Olatz Arbelaitz, Ibai Gurrutxaga, Aizea Lojo, Expert Systems with Application . 2013,第18期

机译：通过网络使用和内容挖掘来提取知识，以便为Bidasoa Turismo网站的用户建模并对其进行调整
3. Informative Content Extraction By Using Eifce [Effective Informative Content Extractor] [J] . Chaw Su Win, Mie Mie Su Thwin International Journal of Scientific & Technology Research . 2013,第6期

机译：使用Eifce提取信息内容[有效的信息内容提取器]
4. An efficient method for extracting web news content [C] . Jian Sun, Luyang Tang, Dan Liao, 2017 International Conference on Engineering and Technology . 2017

机译：一种提取网络新闻内容的有效方法
5. Cleaning Web pages for effective Web content mining. [D] . Li, Jing. 2006

机译：清洁网页以进行有效的Web内容挖掘。
6. Screening of Six Medicinal Plant Extracts Obtained by Two Conventional Methods and Supercritical CO2 Extraction Targeted on Coumarin Content 22-Diphenyl-1-picrylhydrazyl Radical Scavenging Capacity and Total Phenols Content [O] . Maja Molnar, Igor Jerković, Dragica Suknović, 2017

机译：两种常规方法获得的六种药用植物提取物的筛选以及针对香豆素含量22-二苯基-1-吡啶并肼基自由基清除能力和总酚含量的超临界CO2提取
7. Effective Teaching Methods and Proposed Web Libraries for Designing Animated Course Content: A Review [O] . Rajesh Kumar Kaushal, Dr. Surya, Narayan Panda 2016

机译：有效的教学方法和拟议的网络图书馆设计动画课程内容：回顾

An Effective Method to Extract Web Content Information

摘要

著录项

相似文献

相关主题

期刊订阅