首页> 外文会议>International Conference on Computer Science and Education >Research on text mining algorithm based on focused crawler
【24h】

Research on text mining algorithm based on focused crawler

机译:基于聚焦爬虫的文本挖掘算法研究

获取原文

摘要

Internet has become the world's largest information repository, especially the explosive growth of the text data on the web, the disadvantages that it need much more time to acquire and update web pages, and is not high precision have become more obvious. The text mining algorithm based on focused crawler is proposed in this paper, it classifies and integrates the whole web pages by topic using topic crawler algorithm as much as possible, which greatly improves the retrieval ability of the web pages, naive bayes algorithm is adopted on this basis, which realizes the text mining processing of the web data. The experimental results show that the algorithm has good feasibility and higher recall ratio and precision ratio of the web pages.
机译:互联网已经成为世界上最大的信息存储库,尤其是网络上文本数据的爆炸性增长,其缺点是需要花费更多的时间来获取和更新网页,而且精度不高。本文提出了一种基于聚焦爬虫的文本挖掘算法,该算法利用主题爬虫算法对整个网页进行了尽可能多的分类和整合,极大地提高了网页的检索能力,采用朴素贝叶斯算法。在此基础上,实现了网络数据的文本挖掘处理。实验结果表明,该算法具有良好的可行性,网页的查全率和查准率更高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号