Research on text mining algorithm based on focused crawler

机译：基于聚焦爬虫的文本挖掘算法研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Internet has become the world's largest information repository, especially the explosive growth of the text data on the web, the disadvantages that it need much more time to acquire and update web pages, and is not high precision have become more obvious. The text mining algorithm based on focused crawler is proposed in this paper, it classifies and integrates the whole web pages by topic using topic crawler algorithm as much as possible, which greatly improves the retrieval ability of the web pages, naive bayes algorithm is adopted on this basis, which realizes the text mining processing of the web data. The experimental results show that the algorithm has good feasibility and higher recall ratio and precision ratio of the web pages.

机译：互联网已经成为世界上最大的信息存储库，尤其是网络上文本数据的爆炸性增长，其缺点是需要花费更多的时间来获取和更新网页，而且精度不高。本文提出了一种基于聚焦爬虫的文本挖掘算法，该算法利用主题爬虫算法对整个网页进行了尽可能多的分类和整合，极大地提高了网页的检索能力，采用朴素贝叶斯算法。在此基础上，实现了网络数据的文本挖掘处理。实验结果表明，该算法具有良好的可行性，网页的查全率和查准率更高。

著录项

来源
《International Conference on Computer Science and Education》|2017年|454-457|共4页
会议地点
作者
Qiusheng Zhang; Mingyu Lin; Jianping Jun; Xingyun Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Computer science; Education;

机译：计算机科学;教育;

相似文献

外文文献
中文文献
专利

1. ANTON Framework Based on Semantic Focused Crawler to Support Web Crime Mining Using SVM [J] . Javad Hosseinkhani, Hamed Taherdoost, Solmaz Keikhaee Annals of data science . 2021,第2期

机译：基于语义聚焦履带的Anton框架支持使用SVM的Web犯罪挖掘
2. A novel focused crawler based on cell-like membrane computing optimization algorithm [J] . WenJun Liu, YaJun Du Neurocomputing . 2014,第jana10期

机译：基于细胞样膜计算优化算法的新型聚焦履带
3. A Survey about Algorithms Utilized by Focused Web Crawler [J] . Yong-Bin Yu, Shi-Lei Huang, Nyima Tashi, 电子科技学刊：英文版 . 2018,第02)期

机译：聚焦网络爬虫对算法的研究
4. Research on text mining algorithm based on focused crawler [C] . Qiusheng Zhang, Mingyu Lin, Jianping Jun, International Conference on Computer Science and Education . 2017

机译：基于聚焦履带的文本挖掘算法研究
5. Prediction of cost overruns using ensemble methods in data mining and text mining algorithms. [D] . Ramesh, Prathiksha. 2014

机译：在数据挖掘和文本挖掘算法中使用集成方法预测成本超支。
6. Aligning text mining and machine learning algorithms with best practices for study selection in systematic literature reviews [O] . E. Popoff, M. Besada, J. P. Jansen, 2020

机译：对齐文本挖掘和机器学习算法具有系统文学评论中的学习选择的最佳实践
7. Focused Crawler based on Efficient Page Rank Algorithm [O] . Anand Ratna, Akshay Sawhney 2015

机译：基于高效页面排名算法的聚焦爬虫

Research on text mining algorithm based on focused crawler

摘要

著录项

相似文献

相关主题

期刊订阅