首页> 外文会议>International Conference on Network and Information Systems for Computers >The Data Crawling and Hotspot Analyze of Social QA Site
【24h】

The Data Crawling and Hotspot Analyze of Social QA Site

机译:社交问答站点的数据爬网和热点分析

获取原文

摘要

Along with the rapid development of the Internet, more specialized and detailed information sources like Q&A sites have gradually come into being. On these social Q&A platforms, there are plenty of hot topics and news being discussed and even created every minute. Therefore, it is of great practical significance to learn about hot social issues by analyzing and parsing the content on social Q&A platforms. By taking a social Q&A platform as the research subject, this paper analyzes the difficulties in crawling data from this platform and relevant solutions, designs and implement a data crawling system containing a user information storage module, a highly anonymous and available proxy maintenance module, a node crawling and parsing module, and a data storage module. With these modules, the system is able to crawl data and store it without being restricted by the platform. On this basis, this paper designs and implements a hotspot parsing and grading module. Based on echarts, a historical hotspot display module and a trending hotspot display module are designed to show the historical and trending hotspots on this platform. Then, this paper uses the proposed data crawling module and the hotspot analysis and display system to obtain the data of 31,520 regularized independent topics and the real-time data of 979,815 questions from this social Q&A platform. Based on these data, the historical and trending hotspot analysis on this platform is displayed. The experimental results show that this system has fully met the design objectives. Finally, this research summarizes the proposed data crawling and hotspot analysis system and provides reference and directions for future work.
机译:随着Internet的飞速发展,诸如问答网站之类的更加专业,详细的信息源逐渐形成。在这些社交问答平台上,每分钟都会讨论甚至创建许多热门话题和新闻。因此,通过分析和解析社交问答平台上的内容,了解社会热点问题具有重要的现实意义。本文以社交问答平台为研究对象,分析了从该平台抓取数据的难点及相关解决方案,设计并实现了一个数据抓取系统,包括用户信息存储模块,高度匿名且可用的代理维护模块,节点爬网和解析模块,以及数据存储模块。有了这些模块,系统便可以在不受到平台限制的情况下对数据进行爬网和存储。在此基础上,本文设计并实现了热点解析和分级模块。基于echarts,设计了历史热点显示模块和趋势热点显示模块,以在此平台上显示历史热点和趋势热点。然后,本文使用提出的数据爬行模块和热点分析和显示系统,从该社交问答平台获取31,520个正规化独立主题的数据和979,815个问题的实时数据。基于这些数据,将显示此平台上的历史和趋势热点分析。实验结果表明,该系统已完全达到设计目标。最后,本研究总结了所提出的数据爬取和热点分析系统,并为以后的工作提供了参考和指导。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号