Chinese Web Content Extraction Based on Naieve Bayes Model

机译：基于Naive Bayes模型的中国网络内容提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

As the web content extraction becomes more and more difficult, this paper proposes a method that using Naive Bayes Model to train the block attributes eigenvalues of web page. Firstly, this method denoising the web page, represents it as a DOM tree and divides web page into blocks, then uses Naive Bayes Model to get the probability value of the statistical feature about web blocks. At last, it extracts theme blocks to compose content of web page. The test shows that the algorithm could extract content of web page accurately. The average accuracy has reached up to 96.2%.The method has been adopted to extract content for the off-portal search of Hunan Farmer Training Website, and the efficiency is well.

机译：随着Web内容提取变得越来越困难，本文提出了一种使用Naive Bayes模型训练网页的块属性特征值的方法。首先，这种方法去噪了网页，将其表示为DOM树，将网页划分为块，然后使用Naive Bayes模型来获得关于Web块的统计功能的概率值。最后，它提取主题块以撰写网页的内容。该测试表明该算法可以准确提取网页的内容。平均准确性达到了高达96.2％。采用了方法来提取湖南农民培训网站的偏远搜索内容，效率良好。

著录项

来源
《IFIP WG 5.14 International conference on computer and computing technologies in agriculture》|2014年||共10页
会议地点
作者
Wang Jinbo; Wang Lianzhi; Gao Wanlin; Yu Jian; Cui Yuntao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类农业基础科学;
关键词
Web Content Extraction; DOM Tree; Page Segmentation; Naive Bayes Model;

机译：Web内容提取;DOM树;页面分割;天真贝叶斯模型;
入库时间 2022-08-20 22:38:52

相似文献

外文文献
中文文献
专利

1. Landslide susceptibility mapping using an ensemble model of Bagging scheme and random subspace-based naieve Bayes tree in Zigui County of the Three Gorges Reservoir Area, China [J] . Hu Xudong, Huang Cheng, Mei Hongbo, Bulletin of engineering geology and the environment . 2021,第7期

机译：山体滑坡易感性映射，使用了三峡库区Zigui County的装袋计划和基于随机子空间的Naive Bayes树的集合模型。
2. Intelligent Naieve Bayes-based approaches for Web proxy caching [J] . Waleed Ali, Siti Mariyam Shamsuddin, Abdul Samad Ismail Knowledge-Based Systems . 2012,第期

机译：基于智能朴素贝叶斯的Web代理缓存方法
3. A NOVEL STRATEGY FOR A VERTICAL WEB PAGE CLASSIFIER BASED ON CONTINUOUS LEARNING NAIEVE BAYES ALGORITHM [J] . H.A. Ali, A.I. El-Desouky, A.I. Saleh International Journal of Computers & Applications . 2007,第3期

机译：基于连续学习朴素贝叶斯算法的垂直网页分类器的新策略
4. Chinese Web Content Extraction Based on Naieve Bayes Model [C] . Wang Jinbo, Wang Lianzhi, Gao Wanlin, IFIP WG 5.14 International conference on computer and computing technologies in agriculture . 2014

机译：基于朴素贝叶斯模型的中文网页内容提取
5. Modelling Stress Levels Based on Physiological Responses to Web Contents [D] . Isiaka, Fatima 2017

机译：基于生理反应对网状内容的压力水平
6. Internet-based health education in China: a content analysis of websites [O] . Ying Peng, Xi Wu, Salla Atkins, 2014

机译：中国基于互联网的健康教育：网站内容分析
7. Extraction Model Based on Web Format Information Quantity in Blog Post and Comment Extraction [O] . 曹冬林, 廖祥文, 许洪波, 2009

机译：基于Web格式信息量的博客文章和评论提取模型

Chinese Web Content Extraction Based on Naieve Bayes Model

摘要

著录项

相似文献

相关主题

期刊订阅